All of lore.kernel.org
 help / color / mirror / Atom feed
* [next-queue 0/3] i40e: Add an i40e_napi_poll tracepoint
@ 2022-10-05  8:31 ` Joe Damato
  0 siblings, 0 replies; 34+ messages in thread
From: Joe Damato @ 2022-10-05  8:31 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, kuba, davem, anthony.l.nguyen, jesse.brandeburg, Joe Damato

Greetings:

Debugging and tuning the NAPI and i40e NIC parameters can be a bit tricky
as there are many different options to test.

This change adds a tracepoint to i40e_napi_poll which exposes a lot of
helpful debug information for users who'd like to get a better
understanding of how their NIC is performing as they adjust various
parameters and tuning knobs.

Some tweaks to i40e to support this information are made along the way;
including a tweak to i40e_clean_tx_irq (and i40e_clean_xdp_tx_irq, etc)
which returns the actual number of packets processed.

Thanks,
Joe


Joe Damato (3):
  i40e: Store the irq number in i40e_q_vector
  i40e: i40e_clean_tx_irq returns work done
  i40e: Add i40e_napi_poll tracepoint

 drivers/net/ethernet/intel/i40e/i40e.h       |  1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c  |  1 +
 drivers/net/ethernet/intel/i40e/i40e_trace.h | 50 ++++++++++++++++++++++++++++
 drivers/net/ethernet/intel/i40e/i40e_txrx.c  | 27 +++++++++------
 drivers/net/ethernet/intel/i40e/i40e_xsk.c   | 12 +++----
 drivers/net/ethernet/intel/i40e/i40e_xsk.h   |  2 +-
 6 files changed, 75 insertions(+), 18 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [next-queue 0/3] i40e: Add an i40e_napi_poll tracepoint
@ 2022-10-05  8:31 ` Joe Damato
  0 siblings, 0 replies; 34+ messages in thread
From: Joe Damato @ 2022-10-05  8:31 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, Joe Damato, kuba, davem

Greetings:

Debugging and tuning the NAPI and i40e NIC parameters can be a bit tricky
as there are many different options to test.

This change adds a tracepoint to i40e_napi_poll which exposes a lot of
helpful debug information for users who'd like to get a better
understanding of how their NIC is performing as they adjust various
parameters and tuning knobs.

Some tweaks to i40e to support this information are made along the way;
including a tweak to i40e_clean_tx_irq (and i40e_clean_xdp_tx_irq, etc)
which returns the actual number of packets processed.

Thanks,
Joe


Joe Damato (3):
  i40e: Store the irq number in i40e_q_vector
  i40e: i40e_clean_tx_irq returns work done
  i40e: Add i40e_napi_poll tracepoint

 drivers/net/ethernet/intel/i40e/i40e.h       |  1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c  |  1 +
 drivers/net/ethernet/intel/i40e/i40e_trace.h | 50 ++++++++++++++++++++++++++++
 drivers/net/ethernet/intel/i40e/i40e_txrx.c  | 27 +++++++++------
 drivers/net/ethernet/intel/i40e/i40e_xsk.c   | 12 +++----
 drivers/net/ethernet/intel/i40e/i40e_xsk.h   |  2 +-
 6 files changed, 75 insertions(+), 18 deletions(-)

-- 
2.7.4

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [next-queue 1/3] i40e: Store the irq number in i40e_q_vector
  2022-10-05  8:31 ` [Intel-wired-lan] " Joe Damato
@ 2022-10-05  8:31   ` Joe Damato
  -1 siblings, 0 replies; 34+ messages in thread
From: Joe Damato @ 2022-10-05  8:31 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, kuba, davem, anthony.l.nguyen, jesse.brandeburg, Joe Damato

Make it easy to figure out the IRQ number for a particular i40e_q_vector by
storing the assigned IRQ in the structure itself.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      | 1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 9926c4e..8e1f395 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -992,6 +992,7 @@ struct i40e_q_vector {
 	struct rcu_head rcu;	/* to avoid race with update stats on free */
 	char name[I40E_INT_NAME_STR_LEN];
 	bool arm_wb_state;
+	int irq_num;		/* IRQ assigned to this q_vector */
 } ____cacheline_internodealigned_in_smp;
 
 /* lan device */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 6b7535a..6efe130 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -4123,6 +4123,7 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename)
 		}
 
 		/* register for affinity change notifications */
+		q_vector->irq_num = irq_num;
 		q_vector->affinity_notify.notify = i40e_irq_affinity_notify;
 		q_vector->affinity_notify.release = i40e_irq_affinity_release;
 		irq_set_affinity_notifier(irq_num, &q_vector->affinity_notify);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [next-queue 1/3] i40e: Store the irq number in i40e_q_vector
@ 2022-10-05  8:31   ` Joe Damato
  0 siblings, 0 replies; 34+ messages in thread
From: Joe Damato @ 2022-10-05  8:31 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, Joe Damato, kuba, davem

Make it easy to figure out the IRQ number for a particular i40e_q_vector by
storing the assigned IRQ in the structure itself.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      | 1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 9926c4e..8e1f395 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -992,6 +992,7 @@ struct i40e_q_vector {
 	struct rcu_head rcu;	/* to avoid race with update stats on free */
 	char name[I40E_INT_NAME_STR_LEN];
 	bool arm_wb_state;
+	int irq_num;		/* IRQ assigned to this q_vector */
 } ____cacheline_internodealigned_in_smp;
 
 /* lan device */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 6b7535a..6efe130 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -4123,6 +4123,7 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename)
 		}
 
 		/* register for affinity change notifications */
+		q_vector->irq_num = irq_num;
 		q_vector->affinity_notify.notify = i40e_irq_affinity_notify;
 		q_vector->affinity_notify.release = i40e_irq_affinity_release;
 		irq_set_affinity_notifier(irq_num, &q_vector->affinity_notify);
-- 
2.7.4

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [next-queue 2/3] i40e: i40e_clean_tx_irq returns work done
  2022-10-05  8:31 ` [Intel-wired-lan] " Joe Damato
@ 2022-10-05  8:31   ` Joe Damato
  -1 siblings, 0 replies; 34+ messages in thread
From: Joe Damato @ 2022-10-05  8:31 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, kuba, davem, anthony.l.nguyen, jesse.brandeburg, Joe Damato

Adjust i40e_clean_tx_irq to return the actual number of packets cleaned
and adjust the logic in i40e_napi_poll to check this value.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 24 +++++++++++++-----------
 drivers/net/ethernet/intel/i40e/i40e_xsk.c  | 12 ++++++------
 drivers/net/ethernet/intel/i40e/i40e_xsk.h  |  2 +-
 3 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index b97c95f..ed88309 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -924,10 +924,10 @@ void i40e_detect_recover_hung(struct i40e_vsi *vsi)
  * @tx_ring: Tx ring to clean
  * @napi_budget: Used to determine if we are in netpoll
  *
- * Returns true if there's any budget left (e.g. the clean is finished)
+ * Returns the number of packets cleaned
  **/
-static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
-			      struct i40e_ring *tx_ring, int napi_budget)
+static int i40e_clean_tx_irq(struct i40e_vsi *vsi,
+			     struct i40e_ring *tx_ring, int napi_budget)
 {
 	int i = tx_ring->next_to_clean;
 	struct i40e_tx_buffer *tx_buf;
@@ -1026,7 +1026,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
 	i40e_arm_wb(tx_ring, vsi, budget);
 
 	if (ring_is_xdp(tx_ring))
-		return !!budget;
+		return total_packets;
 
 	/* notify netdev of completed buffers */
 	netdev_tx_completed_queue(txring_txq(tx_ring),
@@ -1048,7 +1048,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
 		}
 	}
 
-	return !!budget;
+	return total_packets;
 }
 
 /**
@@ -2689,10 +2689,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
 			       container_of(napi, struct i40e_q_vector, napi);
 	struct i40e_vsi *vsi = q_vector->vsi;
 	struct i40e_ring *ring;
+	bool tx_clean_complete = true;
 	bool clean_complete = true;
 	bool arm_wb = false;
 	int budget_per_ring;
 	int work_done = 0;
+	int tx_wd = 0;
 
 	if (test_bit(__I40E_VSI_DOWN, vsi->state)) {
 		napi_complete(napi);
@@ -2703,12 +2705,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
 	 * budget and be more aggressive about cleaning up the Tx descriptors.
 	 */
 	i40e_for_each_ring(ring, q_vector->tx) {
-		bool wd = ring->xsk_pool ?
-			  i40e_clean_xdp_tx_irq(vsi, ring) :
-			  i40e_clean_tx_irq(vsi, ring, budget);
+		tx_wd = ring->xsk_pool ?
+			i40e_clean_xdp_tx_irq(vsi, ring) :
+			i40e_clean_tx_irq(vsi, ring, budget);
 
-		if (!wd) {
-			clean_complete = false;
+		if (tx_wd >= budget) {
+			tx_clean_complete = false;
 			continue;
 		}
 		arm_wb |= ring->arm_wb;
@@ -2742,7 +2744,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
 	}
 
 	/* If work not completed, return budget and polling will return */
-	if (!clean_complete) {
+	if (!clean_complete || !tx_clean_complete) {
 		int cpu_id = smp_processor_id();
 
 		/* It is possible that the interrupt affinity has changed but,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 790aaeff..925682c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -531,9 +531,9 @@ static void i40e_set_rs_bit(struct i40e_ring *xdp_ring)
  * @xdp_ring: XDP Tx ring
  * @budget: NAPI budget
  *
- * Returns true if the work is finished.
+ * Returns number of packets cleaned
  **/
-static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
+static int i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
 {
 	struct xdp_desc *descs = xdp_ring->xsk_pool->tx_descs;
 	u32 nb_pkts, nb_processed = 0;
@@ -541,7 +541,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
 
 	nb_pkts = xsk_tx_peek_release_desc_batch(xdp_ring->xsk_pool, budget);
 	if (!nb_pkts)
-		return true;
+		return 0;
 
 	if (xdp_ring->next_to_use + nb_pkts >= xdp_ring->count) {
 		nb_processed = xdp_ring->count - xdp_ring->next_to_use;
@@ -558,7 +558,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
 
 	i40e_update_tx_stats(xdp_ring, nb_pkts, total_bytes);
 
-	return nb_pkts < budget;
+	return nb_pkts;
 }
 
 /**
@@ -582,9 +582,9 @@ static void i40e_clean_xdp_tx_buffer(struct i40e_ring *tx_ring,
  * @vsi: Current VSI
  * @tx_ring: XDP Tx ring
  *
- * Returns true if cleanup/tranmission is done.
+ * Returns number of packets cleaned
  **/
-bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring)
+int i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring)
 {
 	struct xsk_buff_pool *bp = tx_ring->xsk_pool;
 	u32 i, completed_frames, xsk_frames = 0;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
index 821df24..4e810c2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
@@ -30,7 +30,7 @@ int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool,
 bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count);
 int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget);
 
-bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring);
+int i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring);
 int i40e_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags);
 int i40e_realloc_rx_bi_zc(struct i40e_vsi *vsi, bool zc);
 void i40e_clear_rx_bi_zc(struct i40e_ring *rx_ring);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [next-queue 2/3] i40e: i40e_clean_tx_irq returns work done
@ 2022-10-05  8:31   ` Joe Damato
  0 siblings, 0 replies; 34+ messages in thread
From: Joe Damato @ 2022-10-05  8:31 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, Joe Damato, kuba, davem

Adjust i40e_clean_tx_irq to return the actual number of packets cleaned
and adjust the logic in i40e_napi_poll to check this value.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 24 +++++++++++++-----------
 drivers/net/ethernet/intel/i40e/i40e_xsk.c  | 12 ++++++------
 drivers/net/ethernet/intel/i40e/i40e_xsk.h  |  2 +-
 3 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index b97c95f..ed88309 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -924,10 +924,10 @@ void i40e_detect_recover_hung(struct i40e_vsi *vsi)
  * @tx_ring: Tx ring to clean
  * @napi_budget: Used to determine if we are in netpoll
  *
- * Returns true if there's any budget left (e.g. the clean is finished)
+ * Returns the number of packets cleaned
  **/
-static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
-			      struct i40e_ring *tx_ring, int napi_budget)
+static int i40e_clean_tx_irq(struct i40e_vsi *vsi,
+			     struct i40e_ring *tx_ring, int napi_budget)
 {
 	int i = tx_ring->next_to_clean;
 	struct i40e_tx_buffer *tx_buf;
@@ -1026,7 +1026,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
 	i40e_arm_wb(tx_ring, vsi, budget);
 
 	if (ring_is_xdp(tx_ring))
-		return !!budget;
+		return total_packets;
 
 	/* notify netdev of completed buffers */
 	netdev_tx_completed_queue(txring_txq(tx_ring),
@@ -1048,7 +1048,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
 		}
 	}
 
-	return !!budget;
+	return total_packets;
 }
 
 /**
@@ -2689,10 +2689,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
 			       container_of(napi, struct i40e_q_vector, napi);
 	struct i40e_vsi *vsi = q_vector->vsi;
 	struct i40e_ring *ring;
+	bool tx_clean_complete = true;
 	bool clean_complete = true;
 	bool arm_wb = false;
 	int budget_per_ring;
 	int work_done = 0;
+	int tx_wd = 0;
 
 	if (test_bit(__I40E_VSI_DOWN, vsi->state)) {
 		napi_complete(napi);
@@ -2703,12 +2705,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
 	 * budget and be more aggressive about cleaning up the Tx descriptors.
 	 */
 	i40e_for_each_ring(ring, q_vector->tx) {
-		bool wd = ring->xsk_pool ?
-			  i40e_clean_xdp_tx_irq(vsi, ring) :
-			  i40e_clean_tx_irq(vsi, ring, budget);
+		tx_wd = ring->xsk_pool ?
+			i40e_clean_xdp_tx_irq(vsi, ring) :
+			i40e_clean_tx_irq(vsi, ring, budget);
 
-		if (!wd) {
-			clean_complete = false;
+		if (tx_wd >= budget) {
+			tx_clean_complete = false;
 			continue;
 		}
 		arm_wb |= ring->arm_wb;
@@ -2742,7 +2744,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
 	}
 
 	/* If work not completed, return budget and polling will return */
-	if (!clean_complete) {
+	if (!clean_complete || !tx_clean_complete) {
 		int cpu_id = smp_processor_id();
 
 		/* It is possible that the interrupt affinity has changed but,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 790aaeff..925682c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -531,9 +531,9 @@ static void i40e_set_rs_bit(struct i40e_ring *xdp_ring)
  * @xdp_ring: XDP Tx ring
  * @budget: NAPI budget
  *
- * Returns true if the work is finished.
+ * Returns number of packets cleaned
  **/
-static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
+static int i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
 {
 	struct xdp_desc *descs = xdp_ring->xsk_pool->tx_descs;
 	u32 nb_pkts, nb_processed = 0;
@@ -541,7 +541,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
 
 	nb_pkts = xsk_tx_peek_release_desc_batch(xdp_ring->xsk_pool, budget);
 	if (!nb_pkts)
-		return true;
+		return 0;
 
 	if (xdp_ring->next_to_use + nb_pkts >= xdp_ring->count) {
 		nb_processed = xdp_ring->count - xdp_ring->next_to_use;
@@ -558,7 +558,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
 
 	i40e_update_tx_stats(xdp_ring, nb_pkts, total_bytes);
 
-	return nb_pkts < budget;
+	return nb_pkts;
 }
 
 /**
@@ -582,9 +582,9 @@ static void i40e_clean_xdp_tx_buffer(struct i40e_ring *tx_ring,
  * @vsi: Current VSI
  * @tx_ring: XDP Tx ring
  *
- * Returns true if cleanup/tranmission is done.
+ * Returns number of packets cleaned
  **/
-bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring)
+int i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring)
 {
 	struct xsk_buff_pool *bp = tx_ring->xsk_pool;
 	u32 i, completed_frames, xsk_frames = 0;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
index 821df24..4e810c2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
@@ -30,7 +30,7 @@ int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool,
 bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count);
 int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget);
 
-bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring);
+int i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring);
 int i40e_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags);
 int i40e_realloc_rx_bi_zc(struct i40e_vsi *vsi, bool zc);
 void i40e_clear_rx_bi_zc(struct i40e_ring *rx_ring);
-- 
2.7.4

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [next-queue 3/3] i40e: Add i40e_napi_poll tracepoint
  2022-10-05  8:31 ` [Intel-wired-lan] " Joe Damato
@ 2022-10-05  8:31   ` Joe Damato
  -1 siblings, 0 replies; 34+ messages in thread
From: Joe Damato @ 2022-10-05  8:31 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, kuba, davem, anthony.l.nguyen, jesse.brandeburg, Joe Damato

Add a tracepoint for i40e_napi_poll that allows users to get detailed
information about the amount of work done. This information can help users
better tune the correct NAPI parameters (like weight and budget), as well
as debug NIC settings like rx-usecs and tx-usecs, etc.

An example of the output from this tracepoint:

[...snip...]

1029.268 :0/0 i40e:i40e_napi_poll(i40e_napi_poll on dev eth1 q i40e-eth1-TxRx-30 irq 172 irq_mask 00000000,00000000,00000000,00000010,00000000,00000000 curr_cpu 68 budget 64 bpr 64 work_done 0 tx_work_done 2 clean_complete 1 tx_clean_complete 1)
	i40e_napi_poll ([i40e])
	i40e_napi_poll ([i40e])
	__napi_poll ([kernel.kallsyms])
	net_rx_action ([kernel.kallsyms])
	__do_softirq ([kernel.kallsyms])
	common_interrupt ([kernel.kallsyms])
	asm_common_interrupt ([kernel.kallsyms])
	intel_idle_irq ([kernel.kallsyms])
	cpuidle_enter_state ([kernel.kallsyms])
	cpuidle_enter ([kernel.kallsyms])
	do_idle ([kernel.kallsyms])
	cpu_startup_entry ([kernel.kallsyms])
	[0x243fd8] ([kernel.kallsyms])
	secondary_startup_64_no_verify ([kernel.kallsyms])

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 drivers/net/ethernet/intel/i40e/i40e_trace.h | 50 ++++++++++++++++++++++++++++
 drivers/net/ethernet/intel/i40e/i40e_txrx.c  |  3 ++
 2 files changed, 53 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_trace.h b/drivers/net/ethernet/intel/i40e/i40e_trace.h
index b5b1229..779d046 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_trace.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_trace.h
@@ -55,6 +55,56 @@
  * being built from shared code.
  */
 
+#define NO_DEV "(i40e no_device)"
+
+TRACE_EVENT(i40e_napi_poll,
+
+	TP_PROTO(struct napi_struct *napi, struct i40e_q_vector *q, int budget,
+		 int budget_per_ring, int work_done, int tx_work_done, bool clean_complete,
+		 bool tx_clean_complete),
+
+	TP_ARGS(napi, q, budget, budget_per_ring, work_done, tx_work_done,
+		clean_complete, tx_clean_complete),
+
+	TP_STRUCT__entry(
+		__field(int, budget)
+		__field(int, budget_per_ring)
+		__field(int, work_done)
+		__field(int, tx_work_done)
+		__field(int, clean_complete)
+		__field(int, tx_clean_complete)
+		__field(int, irq_num)
+		__field(int, curr_cpu)
+		__string(qname, q->name)
+		__string(dev_name, napi->dev ? napi->dev->name : NO_DEV)
+		__bitmask(irq_affinity,	nr_cpumask_bits)
+	),
+
+	TP_fast_assign(
+		__entry->budget = budget;
+		__entry->budget_per_ring = budget_per_ring;
+		__entry->work_done = work_done;
+		__entry->tx_work_done = tx_work_done;
+		__entry->clean_complete = clean_complete;
+		__entry->tx_clean_complete = tx_clean_complete;
+		__entry->irq_num = q->irq_num;
+		__entry->curr_cpu = get_cpu();
+		__assign_str(qname, q->name);
+		__assign_str(dev_name, napi->dev ? napi->dev->name : NO_DEV);
+		__assign_bitmask(irq_affinity, cpumask_bits(&q->affinity_mask),
+				 nr_cpumask_bits);
+	),
+
+	TP_printk("i40e_napi_poll on dev %s q %s irq %d irq_mask %s curr_cpu %d "
+		  "budget %d bpr %d work_done %d tx_work_done %d "
+		  "clean_complete %d tx_clean_complete %d",
+		__get_str(dev_name), __get_str(qname), __entry->irq_num,
+		__get_bitmask(irq_affinity), __entry->curr_cpu, __entry->budget,
+		__entry->budget_per_ring, __entry->work_done,
+		__entry->tx_work_done,
+		__entry->clean_complete, __entry->tx_clean_complete)
+);
+
 /* Events related to a vsi & ring */
 DECLARE_EVENT_CLASS(
 	i40e_tx_template,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index ed88309..8b72f1b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2743,6 +2743,9 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
 			clean_complete = false;
 	}
 
+	trace_i40e_napi_poll(napi, q_vector, budget, budget_per_ring, work_done, tx_wd,
+			     clean_complete, tx_clean_complete);
+
 	/* If work not completed, return budget and polling will return */
 	if (!clean_complete || !tx_clean_complete) {
 		int cpu_id = smp_processor_id();
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [next-queue 3/3] i40e: Add i40e_napi_poll tracepoint
@ 2022-10-05  8:31   ` Joe Damato
  0 siblings, 0 replies; 34+ messages in thread
From: Joe Damato @ 2022-10-05  8:31 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, Joe Damato, kuba, davem

Add a tracepoint for i40e_napi_poll that allows users to get detailed
information about the amount of work done. This information can help users
better tune the correct NAPI parameters (like weight and budget), as well
as debug NIC settings like rx-usecs and tx-usecs, etc.

An example of the output from this tracepoint:

[...snip...]

1029.268 :0/0 i40e:i40e_napi_poll(i40e_napi_poll on dev eth1 q i40e-eth1-TxRx-30 irq 172 irq_mask 00000000,00000000,00000000,00000010,00000000,00000000 curr_cpu 68 budget 64 bpr 64 work_done 0 tx_work_done 2 clean_complete 1 tx_clean_complete 1)
	i40e_napi_poll ([i40e])
	i40e_napi_poll ([i40e])
	__napi_poll ([kernel.kallsyms])
	net_rx_action ([kernel.kallsyms])
	__do_softirq ([kernel.kallsyms])
	common_interrupt ([kernel.kallsyms])
	asm_common_interrupt ([kernel.kallsyms])
	intel_idle_irq ([kernel.kallsyms])
	cpuidle_enter_state ([kernel.kallsyms])
	cpuidle_enter ([kernel.kallsyms])
	do_idle ([kernel.kallsyms])
	cpu_startup_entry ([kernel.kallsyms])
	[0x243fd8] ([kernel.kallsyms])
	secondary_startup_64_no_verify ([kernel.kallsyms])

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 drivers/net/ethernet/intel/i40e/i40e_trace.h | 50 ++++++++++++++++++++++++++++
 drivers/net/ethernet/intel/i40e/i40e_txrx.c  |  3 ++
 2 files changed, 53 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_trace.h b/drivers/net/ethernet/intel/i40e/i40e_trace.h
index b5b1229..779d046 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_trace.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_trace.h
@@ -55,6 +55,56 @@
  * being built from shared code.
  */
 
+#define NO_DEV "(i40e no_device)"
+
+TRACE_EVENT(i40e_napi_poll,
+
+	TP_PROTO(struct napi_struct *napi, struct i40e_q_vector *q, int budget,
+		 int budget_per_ring, int work_done, int tx_work_done, bool clean_complete,
+		 bool tx_clean_complete),
+
+	TP_ARGS(napi, q, budget, budget_per_ring, work_done, tx_work_done,
+		clean_complete, tx_clean_complete),
+
+	TP_STRUCT__entry(
+		__field(int, budget)
+		__field(int, budget_per_ring)
+		__field(int, work_done)
+		__field(int, tx_work_done)
+		__field(int, clean_complete)
+		__field(int, tx_clean_complete)
+		__field(int, irq_num)
+		__field(int, curr_cpu)
+		__string(qname, q->name)
+		__string(dev_name, napi->dev ? napi->dev->name : NO_DEV)
+		__bitmask(irq_affinity,	nr_cpumask_bits)
+	),
+
+	TP_fast_assign(
+		__entry->budget = budget;
+		__entry->budget_per_ring = budget_per_ring;
+		__entry->work_done = work_done;
+		__entry->tx_work_done = tx_work_done;
+		__entry->clean_complete = clean_complete;
+		__entry->tx_clean_complete = tx_clean_complete;
+		__entry->irq_num = q->irq_num;
+		__entry->curr_cpu = get_cpu();
+		__assign_str(qname, q->name);
+		__assign_str(dev_name, napi->dev ? napi->dev->name : NO_DEV);
+		__assign_bitmask(irq_affinity, cpumask_bits(&q->affinity_mask),
+				 nr_cpumask_bits);
+	),
+
+	TP_printk("i40e_napi_poll on dev %s q %s irq %d irq_mask %s curr_cpu %d "
+		  "budget %d bpr %d work_done %d tx_work_done %d "
+		  "clean_complete %d tx_clean_complete %d",
+		__get_str(dev_name), __get_str(qname), __entry->irq_num,
+		__get_bitmask(irq_affinity), __entry->curr_cpu, __entry->budget,
+		__entry->budget_per_ring, __entry->work_done,
+		__entry->tx_work_done,
+		__entry->clean_complete, __entry->tx_clean_complete)
+);
+
 /* Events related to a vsi & ring */
 DECLARE_EVENT_CLASS(
 	i40e_tx_template,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index ed88309..8b72f1b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2743,6 +2743,9 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
 			clean_complete = false;
 	}
 
+	trace_i40e_napi_poll(napi, q_vector, budget, budget_per_ring, work_done, tx_wd,
+			     clean_complete, tx_clean_complete);
+
 	/* If work not completed, return budget and polling will return */
 	if (!clean_complete || !tx_clean_complete) {
 		int cpu_id = smp_processor_id();
-- 
2.7.4

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [next-queue 2/3] i40e: i40e_clean_tx_irq returns work done
  2022-10-05  8:31   ` [Intel-wired-lan] " Joe Damato
@ 2022-10-05  8:45     ` Paul Menzel
  -1 siblings, 0 replies; 34+ messages in thread
From: Paul Menzel @ 2022-10-05  8:45 UTC (permalink / raw)
  To: Joe Damato; +Cc: intel-wired-lan, netdev, kuba, davem

Dear Joe,


Thank you for the patch.

Am 05.10.22 um 10:31 schrieb Joe Damato:
> Adjust i40e_clean_tx_irq to return the actual number of packets cleaned
> and adjust the logic in i40e_napi_poll to check this value.

Nit for the summary/title:

i40e: Return number of cleaned packets in i40e_clean_tx_irq

> Signed-off-by: Joe Damato <jdamato@fastly.com>
> ---
>   drivers/net/ethernet/intel/i40e/i40e_txrx.c | 24 +++++++++++++-----------
>   drivers/net/ethernet/intel/i40e/i40e_xsk.c  | 12 ++++++------
>   drivers/net/ethernet/intel/i40e/i40e_xsk.h  |  2 +-
>   3 files changed, 20 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index b97c95f..ed88309 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -924,10 +924,10 @@ void i40e_detect_recover_hung(struct i40e_vsi *vsi)
>    * @tx_ring: Tx ring to clean
>    * @napi_budget: Used to determine if we are in netpoll
>    *
> - * Returns true if there's any budget left (e.g. the clean is finished)
> + * Returns the number of packets cleaned
>    **/
> -static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
> -			      struct i40e_ring *tx_ring, int napi_budget)
> +static int i40e_clean_tx_irq(struct i40e_vsi *vsi,
> +			     struct i40e_ring *tx_ring, int napi_budget)
>   {
>   	int i = tx_ring->next_to_clean;
>   	struct i40e_tx_buffer *tx_buf;
> @@ -1026,7 +1026,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
>   	i40e_arm_wb(tx_ring, vsi, budget);
>   
>   	if (ring_is_xdp(tx_ring))
> -		return !!budget;
> +		return total_packets;
>   
>   	/* notify netdev of completed buffers */
>   	netdev_tx_completed_queue(txring_txq(tx_ring),
> @@ -1048,7 +1048,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
>   		}
>   	}
>   
> -	return !!budget;
> +	return total_packets;
>   }
>   
>   /**
> @@ -2689,10 +2689,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
>   			       container_of(napi, struct i40e_q_vector, napi);
>   	struct i40e_vsi *vsi = q_vector->vsi;
>   	struct i40e_ring *ring;
> +	bool tx_clean_complete = true;
>   	bool clean_complete = true;
>   	bool arm_wb = false;
>   	int budget_per_ring;
>   	int work_done = 0;
> +	int tx_wd = 0;

Is it necessary to initialize the variable?


Kind regards,

Paul

>   
>   	if (test_bit(__I40E_VSI_DOWN, vsi->state)) {
>   		napi_complete(napi);
> @@ -2703,12 +2705,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
>   	 * budget and be more aggressive about cleaning up the Tx descriptors.
>   	 */
>   	i40e_for_each_ring(ring, q_vector->tx) {
> -		bool wd = ring->xsk_pool ?
> -			  i40e_clean_xdp_tx_irq(vsi, ring) :
> -			  i40e_clean_tx_irq(vsi, ring, budget);
> +		tx_wd = ring->xsk_pool ?
> +			i40e_clean_xdp_tx_irq(vsi, ring) :
> +			i40e_clean_tx_irq(vsi, ring, budget);
>   
> -		if (!wd) {
> -			clean_complete = false;
> +		if (tx_wd >= budget) {
> +			tx_clean_complete = false;
>   			continue;
>   		}
>   		arm_wb |= ring->arm_wb;
> @@ -2742,7 +2744,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
>   	}
>   
>   	/* If work not completed, return budget and polling will return */
> -	if (!clean_complete) {
> +	if (!clean_complete || !tx_clean_complete) {
>   		int cpu_id = smp_processor_id();
>   
>   		/* It is possible that the interrupt affinity has changed but,
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> index 790aaeff..925682c 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> @@ -531,9 +531,9 @@ static void i40e_set_rs_bit(struct i40e_ring *xdp_ring)
>    * @xdp_ring: XDP Tx ring
>    * @budget: NAPI budget
>    *
> - * Returns true if the work is finished.
> + * Returns number of packets cleaned
>    **/
> -static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
> +static int i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
>   {
>   	struct xdp_desc *descs = xdp_ring->xsk_pool->tx_descs;
>   	u32 nb_pkts, nb_processed = 0;
> @@ -541,7 +541,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
>   
>   	nb_pkts = xsk_tx_peek_release_desc_batch(xdp_ring->xsk_pool, budget);
>   	if (!nb_pkts)
> -		return true;
> +		return 0;
>   
>   	if (xdp_ring->next_to_use + nb_pkts >= xdp_ring->count) {
>   		nb_processed = xdp_ring->count - xdp_ring->next_to_use;
> @@ -558,7 +558,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
>   
>   	i40e_update_tx_stats(xdp_ring, nb_pkts, total_bytes);
>   
> -	return nb_pkts < budget;
> +	return nb_pkts;
>   }
>   
>   /**
> @@ -582,9 +582,9 @@ static void i40e_clean_xdp_tx_buffer(struct i40e_ring *tx_ring,
>    * @vsi: Current VSI
>    * @tx_ring: XDP Tx ring
>    *
> - * Returns true if cleanup/tranmission is done.
> + * Returns number of packets cleaned
>    **/
> -bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring)
> +int i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring)
>   {
>   	struct xsk_buff_pool *bp = tx_ring->xsk_pool;
>   	u32 i, completed_frames, xsk_frames = 0;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> index 821df24..4e810c2 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> @@ -30,7 +30,7 @@ int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool,
>   bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count);
>   int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget);
>   
> -bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring);
> +int i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring);
>   int i40e_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags);
>   int i40e_realloc_rx_bi_zc(struct i40e_vsi *vsi, bool zc);
>   void i40e_clear_rx_bi_zc(struct i40e_ring *rx_ring);

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [next-queue 2/3] i40e: i40e_clean_tx_irq returns work done
@ 2022-10-05  8:45     ` Paul Menzel
  0 siblings, 0 replies; 34+ messages in thread
From: Paul Menzel @ 2022-10-05  8:45 UTC (permalink / raw)
  To: Joe Damato; +Cc: netdev, intel-wired-lan, davem, kuba

Dear Joe,


Thank you for the patch.

Am 05.10.22 um 10:31 schrieb Joe Damato:
> Adjust i40e_clean_tx_irq to return the actual number of packets cleaned
> and adjust the logic in i40e_napi_poll to check this value.

Nit for the summary/title:

i40e: Return number of cleaned packets in i40e_clean_tx_irq

> Signed-off-by: Joe Damato <jdamato@fastly.com>
> ---
>   drivers/net/ethernet/intel/i40e/i40e_txrx.c | 24 +++++++++++++-----------
>   drivers/net/ethernet/intel/i40e/i40e_xsk.c  | 12 ++++++------
>   drivers/net/ethernet/intel/i40e/i40e_xsk.h  |  2 +-
>   3 files changed, 20 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index b97c95f..ed88309 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -924,10 +924,10 @@ void i40e_detect_recover_hung(struct i40e_vsi *vsi)
>    * @tx_ring: Tx ring to clean
>    * @napi_budget: Used to determine if we are in netpoll
>    *
> - * Returns true if there's any budget left (e.g. the clean is finished)
> + * Returns the number of packets cleaned
>    **/
> -static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
> -			      struct i40e_ring *tx_ring, int napi_budget)
> +static int i40e_clean_tx_irq(struct i40e_vsi *vsi,
> +			     struct i40e_ring *tx_ring, int napi_budget)
>   {
>   	int i = tx_ring->next_to_clean;
>   	struct i40e_tx_buffer *tx_buf;
> @@ -1026,7 +1026,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
>   	i40e_arm_wb(tx_ring, vsi, budget);
>   
>   	if (ring_is_xdp(tx_ring))
> -		return !!budget;
> +		return total_packets;
>   
>   	/* notify netdev of completed buffers */
>   	netdev_tx_completed_queue(txring_txq(tx_ring),
> @@ -1048,7 +1048,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
>   		}
>   	}
>   
> -	return !!budget;
> +	return total_packets;
>   }
>   
>   /**
> @@ -2689,10 +2689,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
>   			       container_of(napi, struct i40e_q_vector, napi);
>   	struct i40e_vsi *vsi = q_vector->vsi;
>   	struct i40e_ring *ring;
> +	bool tx_clean_complete = true;
>   	bool clean_complete = true;
>   	bool arm_wb = false;
>   	int budget_per_ring;
>   	int work_done = 0;
> +	int tx_wd = 0;

Is it necessary to initialize the variable?


Kind regards,

Paul

>   
>   	if (test_bit(__I40E_VSI_DOWN, vsi->state)) {
>   		napi_complete(napi);
> @@ -2703,12 +2705,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
>   	 * budget and be more aggressive about cleaning up the Tx descriptors.
>   	 */
>   	i40e_for_each_ring(ring, q_vector->tx) {
> -		bool wd = ring->xsk_pool ?
> -			  i40e_clean_xdp_tx_irq(vsi, ring) :
> -			  i40e_clean_tx_irq(vsi, ring, budget);
> +		tx_wd = ring->xsk_pool ?
> +			i40e_clean_xdp_tx_irq(vsi, ring) :
> +			i40e_clean_tx_irq(vsi, ring, budget);
>   
> -		if (!wd) {
> -			clean_complete = false;
> +		if (tx_wd >= budget) {
> +			tx_clean_complete = false;
>   			continue;
>   		}
>   		arm_wb |= ring->arm_wb;
> @@ -2742,7 +2744,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
>   	}
>   
>   	/* If work not completed, return budget and polling will return */
> -	if (!clean_complete) {
> +	if (!clean_complete || !tx_clean_complete) {
>   		int cpu_id = smp_processor_id();
>   
>   		/* It is possible that the interrupt affinity has changed but,
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> index 790aaeff..925682c 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> @@ -531,9 +531,9 @@ static void i40e_set_rs_bit(struct i40e_ring *xdp_ring)
>    * @xdp_ring: XDP Tx ring
>    * @budget: NAPI budget
>    *
> - * Returns true if the work is finished.
> + * Returns number of packets cleaned
>    **/
> -static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
> +static int i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
>   {
>   	struct xdp_desc *descs = xdp_ring->xsk_pool->tx_descs;
>   	u32 nb_pkts, nb_processed = 0;
> @@ -541,7 +541,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
>   
>   	nb_pkts = xsk_tx_peek_release_desc_batch(xdp_ring->xsk_pool, budget);
>   	if (!nb_pkts)
> -		return true;
> +		return 0;
>   
>   	if (xdp_ring->next_to_use + nb_pkts >= xdp_ring->count) {
>   		nb_processed = xdp_ring->count - xdp_ring->next_to_use;
> @@ -558,7 +558,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
>   
>   	i40e_update_tx_stats(xdp_ring, nb_pkts, total_bytes);
>   
> -	return nb_pkts < budget;
> +	return nb_pkts;
>   }
>   
>   /**
> @@ -582,9 +582,9 @@ static void i40e_clean_xdp_tx_buffer(struct i40e_ring *tx_ring,
>    * @vsi: Current VSI
>    * @tx_ring: XDP Tx ring
>    *
> - * Returns true if cleanup/tranmission is done.
> + * Returns number of packets cleaned
>    **/
> -bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring)
> +int i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring)
>   {
>   	struct xsk_buff_pool *bp = tx_ring->xsk_pool;
>   	u32 i, completed_frames, xsk_frames = 0;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> index 821df24..4e810c2 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> @@ -30,7 +30,7 @@ int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool,
>   bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count);
>   int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget);
>   
> -bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring);
> +int i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring);
>   int i40e_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags);
>   int i40e_realloc_rx_bi_zc(struct i40e_vsi *vsi, bool zc);
>   void i40e_clear_rx_bi_zc(struct i40e_ring *rx_ring);
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [next-queue 3/3] i40e: Add i40e_napi_poll tracepoint
  2022-10-05  8:31   ` [Intel-wired-lan] " Joe Damato
@ 2022-10-05 10:27     ` Maciej Fijalkowski
  -1 siblings, 0 replies; 34+ messages in thread
From: Maciej Fijalkowski @ 2022-10-05 10:27 UTC (permalink / raw)
  To: Joe Damato
  Cc: intel-wired-lan, netdev, kuba, davem, anthony.l.nguyen, jesse.brandeburg

On Wed, Oct 05, 2022 at 01:31:43AM -0700, Joe Damato wrote:

Hi Joe,

> Add a tracepoint for i40e_napi_poll that allows users to get detailed
> information about the amount of work done. This information can help users
> better tune the correct NAPI parameters (like weight and budget), as well
> as debug NIC settings like rx-usecs and tx-usecs, etc.
> 
> An example of the output from this tracepoint:
> 
> [...snip...]
> 
> 1029.268 :0/0 i40e:i40e_napi_poll(i40e_napi_poll on dev eth1 q
> i40e-eth1-TxRx-30 irq 172 irq_mask
> 00000000,00000000,00000000,00000010,00000000,00000000 curr_cpu 68 budget
> 64 bpr 64 work_done 0 tx_work_done 2 clean_complete 1 tx_clean_complete
> 1)
> 	i40e_napi_poll ([i40e])
> 	i40e_napi_poll ([i40e])
> 	__napi_poll ([kernel.kallsyms])
> 	net_rx_action ([kernel.kallsyms])
> 	__do_softirq ([kernel.kallsyms])
> 	common_interrupt ([kernel.kallsyms])
> 	asm_common_interrupt ([kernel.kallsyms])
> 	intel_idle_irq ([kernel.kallsyms])
> 	cpuidle_enter_state ([kernel.kallsyms])
> 	cpuidle_enter ([kernel.kallsyms])
> 	do_idle ([kernel.kallsyms])
> 	cpu_startup_entry ([kernel.kallsyms])
> 	[0x243fd8] ([kernel.kallsyms])
> 	secondary_startup_64_no_verify ([kernel.kallsyms])

maybe you could also include how to configure this tracepoint for future
readers?

> 
> Signed-off-by: Joe Damato <jdamato@fastly.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_trace.h | 50 ++++++++++++++++++++++++++++
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c  |  3 ++
>  2 files changed, 53 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_trace.h b/drivers/net/ethernet/intel/i40e/i40e_trace.h
> index b5b1229..779d046 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_trace.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_trace.h
> @@ -55,6 +55,56 @@
>   * being built from shared code.
>   */
>  
> +#define NO_DEV "(i40e no_device)"
> +
> +TRACE_EVENT(i40e_napi_poll,
> +
> +	TP_PROTO(struct napi_struct *napi, struct i40e_q_vector *q, int budget,
> +		 int budget_per_ring, int work_done, int tx_work_done, bool clean_complete,
> +		 bool tx_clean_complete),
> +
> +	TP_ARGS(napi, q, budget, budget_per_ring, work_done, tx_work_done,
> +		clean_complete, tx_clean_complete),
> +
> +	TP_STRUCT__entry(
> +		__field(int, budget)
> +		__field(int, budget_per_ring)
> +		__field(int, work_done)
> +		__field(int, tx_work_done)
> +		__field(int, clean_complete)
> +		__field(int, tx_clean_complete)
> +		__field(int, irq_num)
> +		__field(int, curr_cpu)
> +		__string(qname, q->name)
> +		__string(dev_name, napi->dev ? napi->dev->name : NO_DEV)
> +		__bitmask(irq_affinity,	nr_cpumask_bits)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->budget = budget;
> +		__entry->budget_per_ring = budget_per_ring;
> +		__entry->work_done = work_done;

What if rx clean routines failed to do allocation of new rx bufs? then
this would be misinterpreted. maybe we should change the API to

static bool
i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget,
		  unsigned int *processed_pkts);

so you would return failure and at the end do
	*processed_pkts = total_rx_packets;

then also i would change the naming of tracepoint entry. I'm not a native
english speaker but having 'done' within the variable name suggests to me
that it is rather a boolean. what about something like 'rx_cleaned_pkts'
instead?

Generally I think this is useful, personally I was in need of tracing the
next_to_clean and next_to_use ring indexes a lot, but that is probably out
of the scope in here.

> +		__entry->tx_work_done = tx_work_done;
> +		__entry->clean_complete = clean_complete;
> +		__entry->tx_clean_complete = tx_clean_complete;
> +		__entry->irq_num = q->irq_num;
> +		__entry->curr_cpu = get_cpu();
> +		__assign_str(qname, q->name);
> +		__assign_str(dev_name, napi->dev ? napi->dev->name : NO_DEV);
> +		__assign_bitmask(irq_affinity, cpumask_bits(&q->affinity_mask),
> +				 nr_cpumask_bits);
> +	),
> +
> +	TP_printk("i40e_napi_poll on dev %s q %s irq %d irq_mask %s curr_cpu %d "
> +		  "budget %d bpr %d work_done %d tx_work_done %d "
> +		  "clean_complete %d tx_clean_complete %d",
> +		__get_str(dev_name), __get_str(qname), __entry->irq_num,
> +		__get_bitmask(irq_affinity), __entry->curr_cpu, __entry->budget,
> +		__entry->budget_per_ring, __entry->work_done,
> +		__entry->tx_work_done,
> +		__entry->clean_complete, __entry->tx_clean_complete)
> +);
> +
>  /* Events related to a vsi & ring */
>  DECLARE_EVENT_CLASS(
>  	i40e_tx_template,
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index ed88309..8b72f1b 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -2743,6 +2743,9 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
>  			clean_complete = false;
>  	}
>  
> +	trace_i40e_napi_poll(napi, q_vector, budget, budget_per_ring, work_done, tx_wd,
> +			     clean_complete, tx_clean_complete);
> +
>  	/* If work not completed, return budget and polling will return */
>  	if (!clean_complete || !tx_clean_complete) {
>  		int cpu_id = smp_processor_id();
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [next-queue 3/3] i40e: Add i40e_napi_poll tracepoint
@ 2022-10-05 10:27     ` Maciej Fijalkowski
  0 siblings, 0 replies; 34+ messages in thread
From: Maciej Fijalkowski @ 2022-10-05 10:27 UTC (permalink / raw)
  To: Joe Damato; +Cc: netdev, kuba, intel-wired-lan, davem

On Wed, Oct 05, 2022 at 01:31:43AM -0700, Joe Damato wrote:

Hi Joe,

> Add a tracepoint for i40e_napi_poll that allows users to get detailed
> information about the amount of work done. This information can help users
> better tune the correct NAPI parameters (like weight and budget), as well
> as debug NIC settings like rx-usecs and tx-usecs, etc.
> 
> An example of the output from this tracepoint:
> 
> [...snip...]
> 
> 1029.268 :0/0 i40e:i40e_napi_poll(i40e_napi_poll on dev eth1 q
> i40e-eth1-TxRx-30 irq 172 irq_mask
> 00000000,00000000,00000000,00000010,00000000,00000000 curr_cpu 68 budget
> 64 bpr 64 work_done 0 tx_work_done 2 clean_complete 1 tx_clean_complete
> 1)
> 	i40e_napi_poll ([i40e])
> 	i40e_napi_poll ([i40e])
> 	__napi_poll ([kernel.kallsyms])
> 	net_rx_action ([kernel.kallsyms])
> 	__do_softirq ([kernel.kallsyms])
> 	common_interrupt ([kernel.kallsyms])
> 	asm_common_interrupt ([kernel.kallsyms])
> 	intel_idle_irq ([kernel.kallsyms])
> 	cpuidle_enter_state ([kernel.kallsyms])
> 	cpuidle_enter ([kernel.kallsyms])
> 	do_idle ([kernel.kallsyms])
> 	cpu_startup_entry ([kernel.kallsyms])
> 	[0x243fd8] ([kernel.kallsyms])
> 	secondary_startup_64_no_verify ([kernel.kallsyms])

maybe you could also include how to configure this tracepoint for future
readers?

> 
> Signed-off-by: Joe Damato <jdamato@fastly.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_trace.h | 50 ++++++++++++++++++++++++++++
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c  |  3 ++
>  2 files changed, 53 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_trace.h b/drivers/net/ethernet/intel/i40e/i40e_trace.h
> index b5b1229..779d046 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_trace.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_trace.h
> @@ -55,6 +55,56 @@
>   * being built from shared code.
>   */
>  
> +#define NO_DEV "(i40e no_device)"
> +
> +TRACE_EVENT(i40e_napi_poll,
> +
> +	TP_PROTO(struct napi_struct *napi, struct i40e_q_vector *q, int budget,
> +		 int budget_per_ring, int work_done, int tx_work_done, bool clean_complete,
> +		 bool tx_clean_complete),
> +
> +	TP_ARGS(napi, q, budget, budget_per_ring, work_done, tx_work_done,
> +		clean_complete, tx_clean_complete),
> +
> +	TP_STRUCT__entry(
> +		__field(int, budget)
> +		__field(int, budget_per_ring)
> +		__field(int, work_done)
> +		__field(int, tx_work_done)
> +		__field(int, clean_complete)
> +		__field(int, tx_clean_complete)
> +		__field(int, irq_num)
> +		__field(int, curr_cpu)
> +		__string(qname, q->name)
> +		__string(dev_name, napi->dev ? napi->dev->name : NO_DEV)
> +		__bitmask(irq_affinity,	nr_cpumask_bits)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->budget = budget;
> +		__entry->budget_per_ring = budget_per_ring;
> +		__entry->work_done = work_done;

What if rx clean routines failed to do allocation of new rx bufs? then
this would be misinterpreted. maybe we should change the API to

static bool
i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget,
		  unsigned int *processed_pkts);

so you would return failure and at the end do
	*processed_pkts = total_rx_packets;

then also i would change the naming of tracepoint entry. I'm not a native
english speaker but having 'done' within the variable name suggests to me
that it is rather a boolean. what about something like 'rx_cleaned_pkts'
instead?

Generally I think this is useful, personally I was in need of tracing the
next_to_clean and next_to_use ring indexes a lot, but that is probably out
of the scope in here.

> +		__entry->tx_work_done = tx_work_done;
> +		__entry->clean_complete = clean_complete;
> +		__entry->tx_clean_complete = tx_clean_complete;
> +		__entry->irq_num = q->irq_num;
> +		__entry->curr_cpu = get_cpu();
> +		__assign_str(qname, q->name);
> +		__assign_str(dev_name, napi->dev ? napi->dev->name : NO_DEV);
> +		__assign_bitmask(irq_affinity, cpumask_bits(&q->affinity_mask),
> +				 nr_cpumask_bits);
> +	),
> +
> +	TP_printk("i40e_napi_poll on dev %s q %s irq %d irq_mask %s curr_cpu %d "
> +		  "budget %d bpr %d work_done %d tx_work_done %d "
> +		  "clean_complete %d tx_clean_complete %d",
> +		__get_str(dev_name), __get_str(qname), __entry->irq_num,
> +		__get_bitmask(irq_affinity), __entry->curr_cpu, __entry->budget,
> +		__entry->budget_per_ring, __entry->work_done,
> +		__entry->tx_work_done,
> +		__entry->clean_complete, __entry->tx_clean_complete)
> +);
> +
>  /* Events related to a vsi & ring */
>  DECLARE_EVENT_CLASS(
>  	i40e_tx_template,
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index ed88309..8b72f1b 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -2743,6 +2743,9 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
>  			clean_complete = false;
>  	}
>  
> +	trace_i40e_napi_poll(napi, q_vector, budget, budget_per_ring, work_done, tx_wd,
> +			     clean_complete, tx_clean_complete);
> +
>  	/* If work not completed, return budget and polling will return */
>  	if (!clean_complete || !tx_clean_complete) {
>  		int cpu_id = smp_processor_id();
> -- 
> 2.7.4
> 
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [next-queue 1/3] i40e: Store the irq number in i40e_q_vector
  2022-10-05  8:31   ` [Intel-wired-lan] " Joe Damato
@ 2022-10-05 10:29     ` Maciej Fijalkowski
  -1 siblings, 0 replies; 34+ messages in thread
From: Maciej Fijalkowski @ 2022-10-05 10:29 UTC (permalink / raw)
  To: Joe Damato
  Cc: intel-wired-lan, netdev, kuba, davem, anthony.l.nguyen, jesse.brandeburg

On Wed, Oct 05, 2022 at 01:31:41AM -0700, Joe Damato wrote:
> Make it easy to figure out the IRQ number for a particular i40e_q_vector by
> storing the assigned IRQ in the structure itself.
> 
> Signed-off-by: Joe Damato <jdamato@fastly.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e.h      | 1 +
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
> index 9926c4e..8e1f395 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
> @@ -992,6 +992,7 @@ struct i40e_q_vector {
>  	struct rcu_head rcu;	/* to avoid race with update stats on free */
>  	char name[I40E_INT_NAME_STR_LEN];
>  	bool arm_wb_state;
> +	int irq_num;		/* IRQ assigned to this q_vector */

This struct looks like a mess in terms of members order. Can you check
with pahole how your patch affects the layout of it? Maybe while at it you
could pack it in a better way?

>  } ____cacheline_internodealigned_in_smp;
>  
>  /* lan device */
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 6b7535a..6efe130 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -4123,6 +4123,7 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename)
>  		}
>  
>  		/* register for affinity change notifications */
> +		q_vector->irq_num = irq_num;
>  		q_vector->affinity_notify.notify = i40e_irq_affinity_notify;
>  		q_vector->affinity_notify.release = i40e_irq_affinity_release;
>  		irq_set_affinity_notifier(irq_num, &q_vector->affinity_notify);
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [next-queue 1/3] i40e: Store the irq number in i40e_q_vector
@ 2022-10-05 10:29     ` Maciej Fijalkowski
  0 siblings, 0 replies; 34+ messages in thread
From: Maciej Fijalkowski @ 2022-10-05 10:29 UTC (permalink / raw)
  To: Joe Damato; +Cc: netdev, kuba, intel-wired-lan, davem

On Wed, Oct 05, 2022 at 01:31:41AM -0700, Joe Damato wrote:
> Make it easy to figure out the IRQ number for a particular i40e_q_vector by
> storing the assigned IRQ in the structure itself.
> 
> Signed-off-by: Joe Damato <jdamato@fastly.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e.h      | 1 +
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
> index 9926c4e..8e1f395 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
> @@ -992,6 +992,7 @@ struct i40e_q_vector {
>  	struct rcu_head rcu;	/* to avoid race with update stats on free */
>  	char name[I40E_INT_NAME_STR_LEN];
>  	bool arm_wb_state;
> +	int irq_num;		/* IRQ assigned to this q_vector */

This struct looks like a mess in terms of members order. Can you check
with pahole how your patch affects the layout of it? Maybe while at it you
could pack it in a better way?

>  } ____cacheline_internodealigned_in_smp;
>  
>  /* lan device */
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 6b7535a..6efe130 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -4123,6 +4123,7 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename)
>  		}
>  
>  		/* register for affinity change notifications */
> +		q_vector->irq_num = irq_num;
>  		q_vector->affinity_notify.notify = i40e_irq_affinity_notify;
>  		q_vector->affinity_notify.release = i40e_irq_affinity_release;
>  		irq_set_affinity_notifier(irq_num, &q_vector->affinity_notify);
> -- 
> 2.7.4
> 
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [next-queue 2/3] i40e: i40e_clean_tx_irq returns work done
  2022-10-05  8:31   ` [Intel-wired-lan] " Joe Damato
@ 2022-10-05 10:46     ` Maciej Fijalkowski
  -1 siblings, 0 replies; 34+ messages in thread
From: Maciej Fijalkowski @ 2022-10-05 10:46 UTC (permalink / raw)
  To: Joe Damato
  Cc: intel-wired-lan, netdev, kuba, davem, anthony.l.nguyen, jesse.brandeburg

On Wed, Oct 05, 2022 at 01:31:42AM -0700, Joe Damato wrote:
> Adjust i40e_clean_tx_irq to return the actual number of packets cleaned
> and adjust the logic in i40e_napi_poll to check this value.
> 
> Signed-off-by: Joe Damato <jdamato@fastly.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c | 24 +++++++++++++-----------
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c  | 12 ++++++------
>  drivers/net/ethernet/intel/i40e/i40e_xsk.h  |  2 +-
>  3 files changed, 20 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index b97c95f..ed88309 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -924,10 +924,10 @@ void i40e_detect_recover_hung(struct i40e_vsi *vsi)
>   * @tx_ring: Tx ring to clean
>   * @napi_budget: Used to determine if we are in netpoll
>   *
> - * Returns true if there's any budget left (e.g. the clean is finished)
> + * Returns the number of packets cleaned
>   **/
> -static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
> -			      struct i40e_ring *tx_ring, int napi_budget)
> +static int i40e_clean_tx_irq(struct i40e_vsi *vsi,
> +			     struct i40e_ring *tx_ring, int napi_budget)
>  {
>  	int i = tx_ring->next_to_clean;
>  	struct i40e_tx_buffer *tx_buf;
> @@ -1026,7 +1026,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
>  	i40e_arm_wb(tx_ring, vsi, budget);
>  
>  	if (ring_is_xdp(tx_ring))
> -		return !!budget;
> +		return total_packets;
>  
>  	/* notify netdev of completed buffers */
>  	netdev_tx_completed_queue(txring_txq(tx_ring),
> @@ -1048,7 +1048,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
>  		}
>  	}
>  
> -	return !!budget;
> +	return total_packets;
>  }
>  
>  /**
> @@ -2689,10 +2689,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
>  			       container_of(napi, struct i40e_q_vector, napi);
>  	struct i40e_vsi *vsi = q_vector->vsi;
>  	struct i40e_ring *ring;
> +	bool tx_clean_complete = true;
>  	bool clean_complete = true;
>  	bool arm_wb = false;
>  	int budget_per_ring;
>  	int work_done = 0;
> +	int tx_wd = 0;
>  
>  	if (test_bit(__I40E_VSI_DOWN, vsi->state)) {
>  		napi_complete(napi);
> @@ -2703,12 +2705,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
>  	 * budget and be more aggressive about cleaning up the Tx descriptors.
>  	 */
>  	i40e_for_each_ring(ring, q_vector->tx) {
> -		bool wd = ring->xsk_pool ?
> -			  i40e_clean_xdp_tx_irq(vsi, ring) :
> -			  i40e_clean_tx_irq(vsi, ring, budget);
> +		tx_wd = ring->xsk_pool ?
> +			i40e_clean_xdp_tx_irq(vsi, ring) :
> +			i40e_clean_tx_irq(vsi, ring, budget);
>  
> -		if (!wd) {
> -			clean_complete = false;
> +		if (tx_wd >= budget) {
> +			tx_clean_complete = false;

This will break for AF_XDP Tx ZC. AF_XDP Tx ZC in intel drivers ignores
budget given by NAPI. If you look at i40e_xmit_zc():

func def:
static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)

callsite:
	return i40e_xmit_zc(tx_ring, I40E_DESC_UNUSED(tx_ring));

we give free ring space as a budget and with your change we would be
returning the amount of processed tx descriptors which you will be
comparing against NAPI budget (64, unless you have busy poll enabled with
a different batch size). Say you start with empty ring and your HW rings
are sized to 1k but there was only 512 AF_XDP descriptors ready for Tx.
You produced all of them successfully to ring and you return 512 up to
i40e_napi_poll.

>  			continue;
>  		}
>  		arm_wb |= ring->arm_wb;
> @@ -2742,7 +2744,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
>  	}
>  
>  	/* If work not completed, return budget and polling will return */
> -	if (!clean_complete) {
> +	if (!clean_complete || !tx_clean_complete) {
>  		int cpu_id = smp_processor_id();
>  
>  		/* It is possible that the interrupt affinity has changed but,
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> index 790aaeff..925682c 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> @@ -531,9 +531,9 @@ static void i40e_set_rs_bit(struct i40e_ring *xdp_ring)
>   * @xdp_ring: XDP Tx ring
>   * @budget: NAPI budget
>   *
> - * Returns true if the work is finished.
> + * Returns number of packets cleaned
>   **/
> -static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
> +static int i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
>  {
>  	struct xdp_desc *descs = xdp_ring->xsk_pool->tx_descs;
>  	u32 nb_pkts, nb_processed = 0;
> @@ -541,7 +541,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
>  
>  	nb_pkts = xsk_tx_peek_release_desc_batch(xdp_ring->xsk_pool, budget);
>  	if (!nb_pkts)
> -		return true;
> +		return 0;
>  
>  	if (xdp_ring->next_to_use + nb_pkts >= xdp_ring->count) {
>  		nb_processed = xdp_ring->count - xdp_ring->next_to_use;
> @@ -558,7 +558,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
>  
>  	i40e_update_tx_stats(xdp_ring, nb_pkts, total_bytes);
>  
> -	return nb_pkts < budget;
> +	return nb_pkts;
>  }
>  
>  /**
> @@ -582,9 +582,9 @@ static void i40e_clean_xdp_tx_buffer(struct i40e_ring *tx_ring,
>   * @vsi: Current VSI
>   * @tx_ring: XDP Tx ring
>   *
> - * Returns true if cleanup/tranmission is done.
> + * Returns number of packets cleaned
>   **/
> -bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring)
> +int i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring)
>  {
>  	struct xsk_buff_pool *bp = tx_ring->xsk_pool;
>  	u32 i, completed_frames, xsk_frames = 0;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> index 821df24..4e810c2 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> @@ -30,7 +30,7 @@ int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool,
>  bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count);
>  int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget);
>  
> -bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring);
> +int i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring);
>  int i40e_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags);
>  int i40e_realloc_rx_bi_zc(struct i40e_vsi *vsi, bool zc);
>  void i40e_clear_rx_bi_zc(struct i40e_ring *rx_ring);
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [next-queue 2/3] i40e: i40e_clean_tx_irq returns work done
@ 2022-10-05 10:46     ` Maciej Fijalkowski
  0 siblings, 0 replies; 34+ messages in thread
From: Maciej Fijalkowski @ 2022-10-05 10:46 UTC (permalink / raw)
  To: Joe Damato; +Cc: netdev, kuba, intel-wired-lan, davem

On Wed, Oct 05, 2022 at 01:31:42AM -0700, Joe Damato wrote:
> Adjust i40e_clean_tx_irq to return the actual number of packets cleaned
> and adjust the logic in i40e_napi_poll to check this value.
> 
> Signed-off-by: Joe Damato <jdamato@fastly.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c | 24 +++++++++++++-----------
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c  | 12 ++++++------
>  drivers/net/ethernet/intel/i40e/i40e_xsk.h  |  2 +-
>  3 files changed, 20 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index b97c95f..ed88309 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -924,10 +924,10 @@ void i40e_detect_recover_hung(struct i40e_vsi *vsi)
>   * @tx_ring: Tx ring to clean
>   * @napi_budget: Used to determine if we are in netpoll
>   *
> - * Returns true if there's any budget left (e.g. the clean is finished)
> + * Returns the number of packets cleaned
>   **/
> -static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
> -			      struct i40e_ring *tx_ring, int napi_budget)
> +static int i40e_clean_tx_irq(struct i40e_vsi *vsi,
> +			     struct i40e_ring *tx_ring, int napi_budget)
>  {
>  	int i = tx_ring->next_to_clean;
>  	struct i40e_tx_buffer *tx_buf;
> @@ -1026,7 +1026,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
>  	i40e_arm_wb(tx_ring, vsi, budget);
>  
>  	if (ring_is_xdp(tx_ring))
> -		return !!budget;
> +		return total_packets;
>  
>  	/* notify netdev of completed buffers */
>  	netdev_tx_completed_queue(txring_txq(tx_ring),
> @@ -1048,7 +1048,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
>  		}
>  	}
>  
> -	return !!budget;
> +	return total_packets;
>  }
>  
>  /**
> @@ -2689,10 +2689,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
>  			       container_of(napi, struct i40e_q_vector, napi);
>  	struct i40e_vsi *vsi = q_vector->vsi;
>  	struct i40e_ring *ring;
> +	bool tx_clean_complete = true;
>  	bool clean_complete = true;
>  	bool arm_wb = false;
>  	int budget_per_ring;
>  	int work_done = 0;
> +	int tx_wd = 0;
>  
>  	if (test_bit(__I40E_VSI_DOWN, vsi->state)) {
>  		napi_complete(napi);
> @@ -2703,12 +2705,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
>  	 * budget and be more aggressive about cleaning up the Tx descriptors.
>  	 */
>  	i40e_for_each_ring(ring, q_vector->tx) {
> -		bool wd = ring->xsk_pool ?
> -			  i40e_clean_xdp_tx_irq(vsi, ring) :
> -			  i40e_clean_tx_irq(vsi, ring, budget);
> +		tx_wd = ring->xsk_pool ?
> +			i40e_clean_xdp_tx_irq(vsi, ring) :
> +			i40e_clean_tx_irq(vsi, ring, budget);
>  
> -		if (!wd) {
> -			clean_complete = false;
> +		if (tx_wd >= budget) {
> +			tx_clean_complete = false;

This will break for AF_XDP Tx ZC. AF_XDP Tx ZC in intel drivers ignores
budget given by NAPI. If you look at i40e_xmit_zc():

func def:
static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)

callsite:
	return i40e_xmit_zc(tx_ring, I40E_DESC_UNUSED(tx_ring));

we give free ring space as a budget and with your change we would be
returning the amount of processed tx descriptors which you will be
comparing against NAPI budget (64, unless you have busy poll enabled with
a different batch size). Say you start with empty ring and your HW rings
are sized to 1k but there was only 512 AF_XDP descriptors ready for Tx.
You produced all of them successfully to ring and you return 512 up to
i40e_napi_poll.

>  			continue;
>  		}
>  		arm_wb |= ring->arm_wb;
> @@ -2742,7 +2744,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
>  	}
>  
>  	/* If work not completed, return budget and polling will return */
> -	if (!clean_complete) {
> +	if (!clean_complete || !tx_clean_complete) {
>  		int cpu_id = smp_processor_id();
>  
>  		/* It is possible that the interrupt affinity has changed but,
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> index 790aaeff..925682c 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> @@ -531,9 +531,9 @@ static void i40e_set_rs_bit(struct i40e_ring *xdp_ring)
>   * @xdp_ring: XDP Tx ring
>   * @budget: NAPI budget
>   *
> - * Returns true if the work is finished.
> + * Returns number of packets cleaned
>   **/
> -static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
> +static int i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
>  {
>  	struct xdp_desc *descs = xdp_ring->xsk_pool->tx_descs;
>  	u32 nb_pkts, nb_processed = 0;
> @@ -541,7 +541,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
>  
>  	nb_pkts = xsk_tx_peek_release_desc_batch(xdp_ring->xsk_pool, budget);
>  	if (!nb_pkts)
> -		return true;
> +		return 0;
>  
>  	if (xdp_ring->next_to_use + nb_pkts >= xdp_ring->count) {
>  		nb_processed = xdp_ring->count - xdp_ring->next_to_use;
> @@ -558,7 +558,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
>  
>  	i40e_update_tx_stats(xdp_ring, nb_pkts, total_bytes);
>  
> -	return nb_pkts < budget;
> +	return nb_pkts;
>  }
>  
>  /**
> @@ -582,9 +582,9 @@ static void i40e_clean_xdp_tx_buffer(struct i40e_ring *tx_ring,
>   * @vsi: Current VSI
>   * @tx_ring: XDP Tx ring
>   *
> - * Returns true if cleanup/tranmission is done.
> + * Returns number of packets cleaned
>   **/
> -bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring)
> +int i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring)
>  {
>  	struct xsk_buff_pool *bp = tx_ring->xsk_pool;
>  	u32 i, completed_frames, xsk_frames = 0;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> index 821df24..4e810c2 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> @@ -30,7 +30,7 @@ int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool,
>  bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count);
>  int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget);
>  
> -bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring);
> +int i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring);
>  int i40e_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags);
>  int i40e_realloc_rx_bi_zc(struct i40e_vsi *vsi, bool zc);
>  void i40e_clear_rx_bi_zc(struct i40e_ring *rx_ring);
> -- 
> 2.7.4
> 
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [next-queue 1/3] i40e: Store the irq number in i40e_q_vector
  2022-10-05 10:29     ` [Intel-wired-lan] " Maciej Fijalkowski
@ 2022-10-05 17:00       ` Joe Damato
  -1 siblings, 0 replies; 34+ messages in thread
From: Joe Damato @ 2022-10-05 17:00 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: intel-wired-lan, netdev, kuba, davem, anthony.l.nguyen, jesse.brandeburg

On Wed, Oct 05, 2022 at 12:29:24PM +0200, Maciej Fijalkowski wrote:
> On Wed, Oct 05, 2022 at 01:31:41AM -0700, Joe Damato wrote:
> > Make it easy to figure out the IRQ number for a particular i40e_q_vector by
> > storing the assigned IRQ in the structure itself.
> > 
> > Signed-off-by: Joe Damato <jdamato@fastly.com>
> > ---
> >  drivers/net/ethernet/intel/i40e/i40e.h      | 1 +
> >  drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
> >  2 files changed, 2 insertions(+)
> > 
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
> > index 9926c4e..8e1f395 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e.h
> > +++ b/drivers/net/ethernet/intel/i40e/i40e.h
> > @@ -992,6 +992,7 @@ struct i40e_q_vector {
> >  	struct rcu_head rcu;	/* to avoid race with update stats on free */
> >  	char name[I40E_INT_NAME_STR_LEN];
> >  	bool arm_wb_state;
> > +	int irq_num;		/* IRQ assigned to this q_vector */
> 
> This struct looks like a mess in terms of members order. Can you check
> with pahole how your patch affects the layout of it? Maybe while at it you
> could pack it in a better way?

OK, sure. I used pahole and asked it to reorganize the struct members,
which saves 24 bytes.

I'll update this commit to include the following reorganization in the v2 of
this set:

$ pahole -R -C i40e_q_vector i40e.ko

struct i40e_q_vector {
	struct i40e_vsi *          vsi;                  /*     0     8 */
	u16                        v_idx;                /*     8     2 */
	u16                        reg_idx;              /*    10     2 */
	u8                         num_ringpairs;        /*    12     1 */
	u8                         itr_countdown;        /*    13     1 */
	bool                       arm_wb_state;         /*    14     1 */

	/* XXX 1 byte hole, try to pack */

	struct napi_struct         napi;                 /*    16   400 */
	/* --- cacheline 6 boundary (384 bytes) was 32 bytes ago --- */
	struct i40e_ring_container rx;                   /*   416    32 */
	/* --- cacheline 7 boundary (448 bytes) --- */
	struct i40e_ring_container tx;                   /*   448    32 */
	cpumask_t                  affinity_mask;        /*   480    24 */
	struct irq_affinity_notify affinity_notify;      /*   504    56 */
	/* --- cacheline 8 boundary (512 bytes) was 48 bytes ago --- */
	struct callback_head       rcu;                  /*   560    16 */
	/* --- cacheline 9 boundary (576 bytes) --- */
	char                       name[32];             /*   576    32 */

	/* XXX 4 bytes hole, try to pack */

	int                        irq_num;              /*   612     4 */

	/* size: 616, cachelines: 10, members: 14 */
	/* sum members: 611, holes: 2, sum holes: 5 */
	/* last cacheline: 40 bytes */
};   /* saved 24 bytes! */

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [next-queue 1/3] i40e: Store the irq number in i40e_q_vector
@ 2022-10-05 17:00       ` Joe Damato
  0 siblings, 0 replies; 34+ messages in thread
From: Joe Damato @ 2022-10-05 17:00 UTC (permalink / raw)
  To: Maciej Fijalkowski; +Cc: netdev, kuba, intel-wired-lan, davem

On Wed, Oct 05, 2022 at 12:29:24PM +0200, Maciej Fijalkowski wrote:
> On Wed, Oct 05, 2022 at 01:31:41AM -0700, Joe Damato wrote:
> > Make it easy to figure out the IRQ number for a particular i40e_q_vector by
> > storing the assigned IRQ in the structure itself.
> > 
> > Signed-off-by: Joe Damato <jdamato@fastly.com>
> > ---
> >  drivers/net/ethernet/intel/i40e/i40e.h      | 1 +
> >  drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
> >  2 files changed, 2 insertions(+)
> > 
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
> > index 9926c4e..8e1f395 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e.h
> > +++ b/drivers/net/ethernet/intel/i40e/i40e.h
> > @@ -992,6 +992,7 @@ struct i40e_q_vector {
> >  	struct rcu_head rcu;	/* to avoid race with update stats on free */
> >  	char name[I40E_INT_NAME_STR_LEN];
> >  	bool arm_wb_state;
> > +	int irq_num;		/* IRQ assigned to this q_vector */
> 
> This struct looks like a mess in terms of members order. Can you check
> with pahole how your patch affects the layout of it? Maybe while at it you
> could pack it in a better way?

OK, sure. I used pahole and asked it to reorganize the struct members,
which saves 24 bytes.

I'll update this commit to include the following reorganization in the v2 of
this set:

$ pahole -R -C i40e_q_vector i40e.ko

struct i40e_q_vector {
	struct i40e_vsi *          vsi;                  /*     0     8 */
	u16                        v_idx;                /*     8     2 */
	u16                        reg_idx;              /*    10     2 */
	u8                         num_ringpairs;        /*    12     1 */
	u8                         itr_countdown;        /*    13     1 */
	bool                       arm_wb_state;         /*    14     1 */

	/* XXX 1 byte hole, try to pack */

	struct napi_struct         napi;                 /*    16   400 */
	/* --- cacheline 6 boundary (384 bytes) was 32 bytes ago --- */
	struct i40e_ring_container rx;                   /*   416    32 */
	/* --- cacheline 7 boundary (448 bytes) --- */
	struct i40e_ring_container tx;                   /*   448    32 */
	cpumask_t                  affinity_mask;        /*   480    24 */
	struct irq_affinity_notify affinity_notify;      /*   504    56 */
	/* --- cacheline 8 boundary (512 bytes) was 48 bytes ago --- */
	struct callback_head       rcu;                  /*   560    16 */
	/* --- cacheline 9 boundary (576 bytes) --- */
	char                       name[32];             /*   576    32 */

	/* XXX 4 bytes hole, try to pack */

	int                        irq_num;              /*   612     4 */

	/* size: 616, cachelines: 10, members: 14 */
	/* sum members: 611, holes: 2, sum holes: 5 */
	/* last cacheline: 40 bytes */
};   /* saved 24 bytes! */
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [next-queue 2/3] i40e: i40e_clean_tx_irq returns work done
  2022-10-05 10:46     ` [Intel-wired-lan] " Maciej Fijalkowski
@ 2022-10-05 17:50       ` Joe Damato
  -1 siblings, 0 replies; 34+ messages in thread
From: Joe Damato @ 2022-10-05 17:50 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: intel-wired-lan, netdev, kuba, davem, anthony.l.nguyen, jesse.brandeburg

On Wed, Oct 05, 2022 at 12:46:31PM +0200, Maciej Fijalkowski wrote:
> On Wed, Oct 05, 2022 at 01:31:42AM -0700, Joe Damato wrote:
> > Adjust i40e_clean_tx_irq to return the actual number of packets cleaned
> > and adjust the logic in i40e_napi_poll to check this value.
> > 
> > Signed-off-by: Joe Damato <jdamato@fastly.com>
> > ---
> >  drivers/net/ethernet/intel/i40e/i40e_txrx.c | 24 +++++++++++++-----------
> >  drivers/net/ethernet/intel/i40e/i40e_xsk.c  | 12 ++++++------
> >  drivers/net/ethernet/intel/i40e/i40e_xsk.h  |  2 +-
> >  3 files changed, 20 insertions(+), 18 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> > index b97c95f..ed88309 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> > @@ -924,10 +924,10 @@ void i40e_detect_recover_hung(struct i40e_vsi *vsi)
> >   * @tx_ring: Tx ring to clean
> >   * @napi_budget: Used to determine if we are in netpoll
> >   *
> > - * Returns true if there's any budget left (e.g. the clean is finished)
> > + * Returns the number of packets cleaned
> >   **/
> > -static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
> > -			      struct i40e_ring *tx_ring, int napi_budget)
> > +static int i40e_clean_tx_irq(struct i40e_vsi *vsi,
> > +			     struct i40e_ring *tx_ring, int napi_budget)
> >  {
> >  	int i = tx_ring->next_to_clean;
> >  	struct i40e_tx_buffer *tx_buf;
> > @@ -1026,7 +1026,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
> >  	i40e_arm_wb(tx_ring, vsi, budget);
> >  
> >  	if (ring_is_xdp(tx_ring))
> > -		return !!budget;
> > +		return total_packets;
> >  
> >  	/* notify netdev of completed buffers */
> >  	netdev_tx_completed_queue(txring_txq(tx_ring),
> > @@ -1048,7 +1048,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
> >  		}
> >  	}
> >  
> > -	return !!budget;
> > +	return total_packets;
> >  }
> >  
> >  /**
> > @@ -2689,10 +2689,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
> >  			       container_of(napi, struct i40e_q_vector, napi);
> >  	struct i40e_vsi *vsi = q_vector->vsi;
> >  	struct i40e_ring *ring;
> > +	bool tx_clean_complete = true;
> >  	bool clean_complete = true;
> >  	bool arm_wb = false;
> >  	int budget_per_ring;
> >  	int work_done = 0;
> > +	int tx_wd = 0;
> >  
> >  	if (test_bit(__I40E_VSI_DOWN, vsi->state)) {
> >  		napi_complete(napi);
> > @@ -2703,12 +2705,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
> >  	 * budget and be more aggressive about cleaning up the Tx descriptors.
> >  	 */
> >  	i40e_for_each_ring(ring, q_vector->tx) {
> > -		bool wd = ring->xsk_pool ?
> > -			  i40e_clean_xdp_tx_irq(vsi, ring) :
> > -			  i40e_clean_tx_irq(vsi, ring, budget);
> > +		tx_wd = ring->xsk_pool ?
> > +			i40e_clean_xdp_tx_irq(vsi, ring) :
> > +			i40e_clean_tx_irq(vsi, ring, budget);
> >  
> > -		if (!wd) {
> > -			clean_complete = false;
> > +		if (tx_wd >= budget) {
> > +			tx_clean_complete = false;
> 
> This will break for AF_XDP Tx ZC. AF_XDP Tx ZC in intel drivers ignores
> budget given by NAPI. If you look at i40e_xmit_zc():
> 
> func def:
> static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
> 
> callsite:
> 	return i40e_xmit_zc(tx_ring, I40E_DESC_UNUSED(tx_ring));
> 
> we give free ring space as a budget and with your change we would be
> returning the amount of processed tx descriptors which you will be
> comparing against NAPI budget (64, unless you have busy poll enabled with
> a different batch size). Say you start with empty ring and your HW rings
> are sized to 1k but there was only 512 AF_XDP descriptors ready for Tx.
> You produced all of them successfully to ring and you return 512 up to
> i40e_napi_poll.

Good point, my bad.

I've reworked this for the v2 and have given i40e_clean_tx_irq,
and i40e_clean_xdp_tx_irq an out parameter which will record the number
TXes cleaned.

I tweaked i40e_xmit_zc to return the number of packets (nb_pkts) and moved
the boolean to check if that's under the "budget"
(I40E_DESC_UNUSED(tx_ring)) into i40e_clean_xdp_tx_irq.

I think that might solve the issues you've described.


> >  			continue;
> >  		}
> >  		arm_wb |= ring->arm_wb;
> > @@ -2742,7 +2744,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
> >  	}
> >  
> >  	/* If work not completed, return budget and polling will return */
> > -	if (!clean_complete) {
> > +	if (!clean_complete || !tx_clean_complete) {
> >  		int cpu_id = smp_processor_id();
> >  
> >  		/* It is possible that the interrupt affinity has changed but,
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> > index 790aaeff..925682c 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> > @@ -531,9 +531,9 @@ static void i40e_set_rs_bit(struct i40e_ring *xdp_ring)
> >   * @xdp_ring: XDP Tx ring
> >   * @budget: NAPI budget
> >   *
> > - * Returns true if the work is finished.
> > + * Returns number of packets cleaned
> >   **/
> > -static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
> > +static int i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
> >  {
> >  	struct xdp_desc *descs = xdp_ring->xsk_pool->tx_descs;
> >  	u32 nb_pkts, nb_processed = 0;
> > @@ -541,7 +541,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
> >  
> >  	nb_pkts = xsk_tx_peek_release_desc_batch(xdp_ring->xsk_pool, budget);
> >  	if (!nb_pkts)
> > -		return true;
> > +		return 0;
> >  
> >  	if (xdp_ring->next_to_use + nb_pkts >= xdp_ring->count) {
> >  		nb_processed = xdp_ring->count - xdp_ring->next_to_use;
> > @@ -558,7 +558,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
> >  
> >  	i40e_update_tx_stats(xdp_ring, nb_pkts, total_bytes);
> >  
> > -	return nb_pkts < budget;
> > +	return nb_pkts;
> >  }
> >  
> >  /**
> > @@ -582,9 +582,9 @@ static void i40e_clean_xdp_tx_buffer(struct i40e_ring *tx_ring,
> >   * @vsi: Current VSI
> >   * @tx_ring: XDP Tx ring
> >   *
> > - * Returns true if cleanup/tranmission is done.
> > + * Returns number of packets cleaned
> >   **/
> > -bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring)
> > +int i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring)
> >  {
> >  	struct xsk_buff_pool *bp = tx_ring->xsk_pool;
> >  	u32 i, completed_frames, xsk_frames = 0;
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> > index 821df24..4e810c2 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> > @@ -30,7 +30,7 @@ int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool,
> >  bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count);
> >  int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget);
> >  
> > -bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring);
> > +int i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring);
> >  int i40e_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags);
> >  int i40e_realloc_rx_bi_zc(struct i40e_vsi *vsi, bool zc);
> >  void i40e_clear_rx_bi_zc(struct i40e_ring *rx_ring);
> > -- 
> > 2.7.4
> > 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [next-queue 2/3] i40e: i40e_clean_tx_irq returns work done
@ 2022-10-05 17:50       ` Joe Damato
  0 siblings, 0 replies; 34+ messages in thread
From: Joe Damato @ 2022-10-05 17:50 UTC (permalink / raw)
  To: Maciej Fijalkowski; +Cc: netdev, kuba, intel-wired-lan, davem

On Wed, Oct 05, 2022 at 12:46:31PM +0200, Maciej Fijalkowski wrote:
> On Wed, Oct 05, 2022 at 01:31:42AM -0700, Joe Damato wrote:
> > Adjust i40e_clean_tx_irq to return the actual number of packets cleaned
> > and adjust the logic in i40e_napi_poll to check this value.
> > 
> > Signed-off-by: Joe Damato <jdamato@fastly.com>
> > ---
> >  drivers/net/ethernet/intel/i40e/i40e_txrx.c | 24 +++++++++++++-----------
> >  drivers/net/ethernet/intel/i40e/i40e_xsk.c  | 12 ++++++------
> >  drivers/net/ethernet/intel/i40e/i40e_xsk.h  |  2 +-
> >  3 files changed, 20 insertions(+), 18 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> > index b97c95f..ed88309 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> > @@ -924,10 +924,10 @@ void i40e_detect_recover_hung(struct i40e_vsi *vsi)
> >   * @tx_ring: Tx ring to clean
> >   * @napi_budget: Used to determine if we are in netpoll
> >   *
> > - * Returns true if there's any budget left (e.g. the clean is finished)
> > + * Returns the number of packets cleaned
> >   **/
> > -static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
> > -			      struct i40e_ring *tx_ring, int napi_budget)
> > +static int i40e_clean_tx_irq(struct i40e_vsi *vsi,
> > +			     struct i40e_ring *tx_ring, int napi_budget)
> >  {
> >  	int i = tx_ring->next_to_clean;
> >  	struct i40e_tx_buffer *tx_buf;
> > @@ -1026,7 +1026,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
> >  	i40e_arm_wb(tx_ring, vsi, budget);
> >  
> >  	if (ring_is_xdp(tx_ring))
> > -		return !!budget;
> > +		return total_packets;
> >  
> >  	/* notify netdev of completed buffers */
> >  	netdev_tx_completed_queue(txring_txq(tx_ring),
> > @@ -1048,7 +1048,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
> >  		}
> >  	}
> >  
> > -	return !!budget;
> > +	return total_packets;
> >  }
> >  
> >  /**
> > @@ -2689,10 +2689,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
> >  			       container_of(napi, struct i40e_q_vector, napi);
> >  	struct i40e_vsi *vsi = q_vector->vsi;
> >  	struct i40e_ring *ring;
> > +	bool tx_clean_complete = true;
> >  	bool clean_complete = true;
> >  	bool arm_wb = false;
> >  	int budget_per_ring;
> >  	int work_done = 0;
> > +	int tx_wd = 0;
> >  
> >  	if (test_bit(__I40E_VSI_DOWN, vsi->state)) {
> >  		napi_complete(napi);
> > @@ -2703,12 +2705,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
> >  	 * budget and be more aggressive about cleaning up the Tx descriptors.
> >  	 */
> >  	i40e_for_each_ring(ring, q_vector->tx) {
> > -		bool wd = ring->xsk_pool ?
> > -			  i40e_clean_xdp_tx_irq(vsi, ring) :
> > -			  i40e_clean_tx_irq(vsi, ring, budget);
> > +		tx_wd = ring->xsk_pool ?
> > +			i40e_clean_xdp_tx_irq(vsi, ring) :
> > +			i40e_clean_tx_irq(vsi, ring, budget);
> >  
> > -		if (!wd) {
> > -			clean_complete = false;
> > +		if (tx_wd >= budget) {
> > +			tx_clean_complete = false;
> 
> This will break for AF_XDP Tx ZC. AF_XDP Tx ZC in intel drivers ignores
> budget given by NAPI. If you look at i40e_xmit_zc():
> 
> func def:
> static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
> 
> callsite:
> 	return i40e_xmit_zc(tx_ring, I40E_DESC_UNUSED(tx_ring));
> 
> we give free ring space as a budget and with your change we would be
> returning the amount of processed tx descriptors which you will be
> comparing against NAPI budget (64, unless you have busy poll enabled with
> a different batch size). Say you start with empty ring and your HW rings
> are sized to 1k but there was only 512 AF_XDP descriptors ready for Tx.
> You produced all of them successfully to ring and you return 512 up to
> i40e_napi_poll.

Good point, my bad.

I've reworked this for the v2 and have given i40e_clean_tx_irq,
and i40e_clean_xdp_tx_irq an out parameter which will record the number
TXes cleaned.

I tweaked i40e_xmit_zc to return the number of packets (nb_pkts) and moved
the boolean to check if that's under the "budget"
(I40E_DESC_UNUSED(tx_ring)) into i40e_clean_xdp_tx_irq.

I think that might solve the issues you've described.


> >  			continue;
> >  		}
> >  		arm_wb |= ring->arm_wb;
> > @@ -2742,7 +2744,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
> >  	}
> >  
> >  	/* If work not completed, return budget and polling will return */
> > -	if (!clean_complete) {
> > +	if (!clean_complete || !tx_clean_complete) {
> >  		int cpu_id = smp_processor_id();
> >  
> >  		/* It is possible that the interrupt affinity has changed but,
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> > index 790aaeff..925682c 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> > @@ -531,9 +531,9 @@ static void i40e_set_rs_bit(struct i40e_ring *xdp_ring)
> >   * @xdp_ring: XDP Tx ring
> >   * @budget: NAPI budget
> >   *
> > - * Returns true if the work is finished.
> > + * Returns number of packets cleaned
> >   **/
> > -static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
> > +static int i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
> >  {
> >  	struct xdp_desc *descs = xdp_ring->xsk_pool->tx_descs;
> >  	u32 nb_pkts, nb_processed = 0;
> > @@ -541,7 +541,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
> >  
> >  	nb_pkts = xsk_tx_peek_release_desc_batch(xdp_ring->xsk_pool, budget);
> >  	if (!nb_pkts)
> > -		return true;
> > +		return 0;
> >  
> >  	if (xdp_ring->next_to_use + nb_pkts >= xdp_ring->count) {
> >  		nb_processed = xdp_ring->count - xdp_ring->next_to_use;
> > @@ -558,7 +558,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
> >  
> >  	i40e_update_tx_stats(xdp_ring, nb_pkts, total_bytes);
> >  
> > -	return nb_pkts < budget;
> > +	return nb_pkts;
> >  }
> >  
> >  /**
> > @@ -582,9 +582,9 @@ static void i40e_clean_xdp_tx_buffer(struct i40e_ring *tx_ring,
> >   * @vsi: Current VSI
> >   * @tx_ring: XDP Tx ring
> >   *
> > - * Returns true if cleanup/tranmission is done.
> > + * Returns number of packets cleaned
> >   **/
> > -bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring)
> > +int i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring)
> >  {
> >  	struct xsk_buff_pool *bp = tx_ring->xsk_pool;
> >  	u32 i, completed_frames, xsk_frames = 0;
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> > index 821df24..4e810c2 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> > @@ -30,7 +30,7 @@ int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool,
> >  bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count);
> >  int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget);
> >  
> > -bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring);
> > +int i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring);
> >  int i40e_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags);
> >  int i40e_realloc_rx_bi_zc(struct i40e_vsi *vsi, bool zc);
> >  void i40e_clear_rx_bi_zc(struct i40e_ring *rx_ring);
> > -- 
> > 2.7.4
> > 
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [next-queue 3/3] i40e: Add i40e_napi_poll tracepoint
  2022-10-05 10:27     ` [Intel-wired-lan] " Maciej Fijalkowski
@ 2022-10-05 17:56       ` Joe Damato
  -1 siblings, 0 replies; 34+ messages in thread
From: Joe Damato @ 2022-10-05 17:56 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: intel-wired-lan, netdev, kuba, davem, anthony.l.nguyen, jesse.brandeburg

On Wed, Oct 05, 2022 at 12:27:30PM +0200, Maciej Fijalkowski wrote:
> On Wed, Oct 05, 2022 at 01:31:43AM -0700, Joe Damato wrote:
> 
> Hi Joe,
> 
> > Add a tracepoint for i40e_napi_poll that allows users to get detailed
> > information about the amount of work done. This information can help users
> > better tune the correct NAPI parameters (like weight and budget), as well
> > as debug NIC settings like rx-usecs and tx-usecs, etc.
> > 
> > An example of the output from this tracepoint:
> > 
> > [...snip...]
> > 
> > 1029.268 :0/0 i40e:i40e_napi_poll(i40e_napi_poll on dev eth1 q
> > i40e-eth1-TxRx-30 irq 172 irq_mask
> > 00000000,00000000,00000000,00000010,00000000,00000000 curr_cpu 68 budget
> > 64 bpr 64 work_done 0 tx_work_done 2 clean_complete 1 tx_clean_complete
> > 1)
> > 	i40e_napi_poll ([i40e])
> > 	i40e_napi_poll ([i40e])
> > 	__napi_poll ([kernel.kallsyms])
> > 	net_rx_action ([kernel.kallsyms])
> > 	__do_softirq ([kernel.kallsyms])
> > 	common_interrupt ([kernel.kallsyms])
> > 	asm_common_interrupt ([kernel.kallsyms])
> > 	intel_idle_irq ([kernel.kallsyms])
> > 	cpuidle_enter_state ([kernel.kallsyms])
> > 	cpuidle_enter ([kernel.kallsyms])
> > 	do_idle ([kernel.kallsyms])
> > 	cpu_startup_entry ([kernel.kallsyms])
> > 	[0x243fd8] ([kernel.kallsyms])
> > 	secondary_startup_64_no_verify ([kernel.kallsyms])
> 
> maybe you could also include how to configure this tracepoint for future
> readers?

Ah, for some reason I deleted that line from the commit message. Will
include it in the v2.

> > 
> > Signed-off-by: Joe Damato <jdamato@fastly.com>
> > ---
> >  drivers/net/ethernet/intel/i40e/i40e_trace.h | 50 ++++++++++++++++++++++++++++
> >  drivers/net/ethernet/intel/i40e/i40e_txrx.c  |  3 ++
> >  2 files changed, 53 insertions(+)
> > 
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_trace.h b/drivers/net/ethernet/intel/i40e/i40e_trace.h
> > index b5b1229..779d046 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_trace.h
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_trace.h
> > @@ -55,6 +55,56 @@
> >   * being built from shared code.
> >   */
> >  
> > +#define NO_DEV "(i40e no_device)"
> > +
> > +TRACE_EVENT(i40e_napi_poll,
> > +
> > +	TP_PROTO(struct napi_struct *napi, struct i40e_q_vector *q, int budget,
> > +		 int budget_per_ring, int work_done, int tx_work_done, bool clean_complete,
> > +		 bool tx_clean_complete),
> > +
> > +	TP_ARGS(napi, q, budget, budget_per_ring, work_done, tx_work_done,
> > +		clean_complete, tx_clean_complete),
> > +
> > +	TP_STRUCT__entry(
> > +		__field(int, budget)
> > +		__field(int, budget_per_ring)
> > +		__field(int, work_done)
> > +		__field(int, tx_work_done)
> > +		__field(int, clean_complete)
> > +		__field(int, tx_clean_complete)
> > +		__field(int, irq_num)
> > +		__field(int, curr_cpu)
> > +		__string(qname, q->name)
> > +		__string(dev_name, napi->dev ? napi->dev->name : NO_DEV)
> > +		__bitmask(irq_affinity,	nr_cpumask_bits)
> > +	),
> > +
> > +	TP_fast_assign(
> > +		__entry->budget = budget;
> > +		__entry->budget_per_ring = budget_per_ring;
> > +		__entry->work_done = work_done;
> 
> What if rx clean routines failed to do allocation of new rx bufs? then
> this would be misinterpreted. maybe we should change the API to
> 
> static bool
> i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget,
> 		  unsigned int *processed_pkts);
> 
> so you would return failure and at the end do
> 	*processed_pkts = total_rx_packets;

I reworked i40e_clean_rx_irq and i40e_clean_rx_irq_zc to do what you've
described -- this is also how I ended up approaching counting TX cleaned.

> 
> then also i would change the naming of tracepoint entry. I'm not a native
> english speaker but having 'done' within the variable name suggests to me
> that it is rather a boolean. what about something like 'rx_cleaned_pkts'
> instead?

Sure, I've changed the trace prototype, struct, and print statement to use
"rx_cleaned" and "tx_cleaned" instead of "work_done" and "tx_work_done".

>
> Generally I think this is useful, personally I was in need of tracing the
> next_to_clean and next_to_use ring indexes a lot, but that is probably out
> of the scope in here.

Yea, I've used those for debugging other things, as well - they are quite
useful, but I agree... I think that's out of scope for this set :)

Thank you very much for all your detailed feedback. Hopefully the v2 will
be a bit closer.

> > +		__entry->tx_work_done = tx_work_done;
> > +		__entry->clean_complete = clean_complete;
> > +		__entry->tx_clean_complete = tx_clean_complete;
> > +		__entry->irq_num = q->irq_num;
> > +		__entry->curr_cpu = get_cpu();
> > +		__assign_str(qname, q->name);
> > +		__assign_str(dev_name, napi->dev ? napi->dev->name : NO_DEV);
> > +		__assign_bitmask(irq_affinity, cpumask_bits(&q->affinity_mask),
> > +				 nr_cpumask_bits);
> > +	),
> > +
> > +	TP_printk("i40e_napi_poll on dev %s q %s irq %d irq_mask %s curr_cpu %d "
> > +		  "budget %d bpr %d work_done %d tx_work_done %d "
> > +		  "clean_complete %d tx_clean_complete %d",
> > +		__get_str(dev_name), __get_str(qname), __entry->irq_num,
> > +		__get_bitmask(irq_affinity), __entry->curr_cpu, __entry->budget,
> > +		__entry->budget_per_ring, __entry->work_done,
> > +		__entry->tx_work_done,
> > +		__entry->clean_complete, __entry->tx_clean_complete)
> > +);
> > +
> >  /* Events related to a vsi & ring */
> >  DECLARE_EVENT_CLASS(
> >  	i40e_tx_template,
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> > index ed88309..8b72f1b 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> > @@ -2743,6 +2743,9 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
> >  			clean_complete = false;
> >  	}
> >  
> > +	trace_i40e_napi_poll(napi, q_vector, budget, budget_per_ring, work_done, tx_wd,
> > +			     clean_complete, tx_clean_complete);
> > +
> >  	/* If work not completed, return budget and polling will return */
> >  	if (!clean_complete || !tx_clean_complete) {
> >  		int cpu_id = smp_processor_id();
> > -- 
> > 2.7.4
> > 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [next-queue 3/3] i40e: Add i40e_napi_poll tracepoint
@ 2022-10-05 17:56       ` Joe Damato
  0 siblings, 0 replies; 34+ messages in thread
From: Joe Damato @ 2022-10-05 17:56 UTC (permalink / raw)
  To: Maciej Fijalkowski; +Cc: netdev, kuba, intel-wired-lan, davem

On Wed, Oct 05, 2022 at 12:27:30PM +0200, Maciej Fijalkowski wrote:
> On Wed, Oct 05, 2022 at 01:31:43AM -0700, Joe Damato wrote:
> 
> Hi Joe,
> 
> > Add a tracepoint for i40e_napi_poll that allows users to get detailed
> > information about the amount of work done. This information can help users
> > better tune the correct NAPI parameters (like weight and budget), as well
> > as debug NIC settings like rx-usecs and tx-usecs, etc.
> > 
> > An example of the output from this tracepoint:
> > 
> > [...snip...]
> > 
> > 1029.268 :0/0 i40e:i40e_napi_poll(i40e_napi_poll on dev eth1 q
> > i40e-eth1-TxRx-30 irq 172 irq_mask
> > 00000000,00000000,00000000,00000010,00000000,00000000 curr_cpu 68 budget
> > 64 bpr 64 work_done 0 tx_work_done 2 clean_complete 1 tx_clean_complete
> > 1)
> > 	i40e_napi_poll ([i40e])
> > 	i40e_napi_poll ([i40e])
> > 	__napi_poll ([kernel.kallsyms])
> > 	net_rx_action ([kernel.kallsyms])
> > 	__do_softirq ([kernel.kallsyms])
> > 	common_interrupt ([kernel.kallsyms])
> > 	asm_common_interrupt ([kernel.kallsyms])
> > 	intel_idle_irq ([kernel.kallsyms])
> > 	cpuidle_enter_state ([kernel.kallsyms])
> > 	cpuidle_enter ([kernel.kallsyms])
> > 	do_idle ([kernel.kallsyms])
> > 	cpu_startup_entry ([kernel.kallsyms])
> > 	[0x243fd8] ([kernel.kallsyms])
> > 	secondary_startup_64_no_verify ([kernel.kallsyms])
> 
> maybe you could also include how to configure this tracepoint for future
> readers?

Ah, for some reason I deleted that line from the commit message. Will
include it in the v2.

> > 
> > Signed-off-by: Joe Damato <jdamato@fastly.com>
> > ---
> >  drivers/net/ethernet/intel/i40e/i40e_trace.h | 50 ++++++++++++++++++++++++++++
> >  drivers/net/ethernet/intel/i40e/i40e_txrx.c  |  3 ++
> >  2 files changed, 53 insertions(+)
> > 
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_trace.h b/drivers/net/ethernet/intel/i40e/i40e_trace.h
> > index b5b1229..779d046 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_trace.h
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_trace.h
> > @@ -55,6 +55,56 @@
> >   * being built from shared code.
> >   */
> >  
> > +#define NO_DEV "(i40e no_device)"
> > +
> > +TRACE_EVENT(i40e_napi_poll,
> > +
> > +	TP_PROTO(struct napi_struct *napi, struct i40e_q_vector *q, int budget,
> > +		 int budget_per_ring, int work_done, int tx_work_done, bool clean_complete,
> > +		 bool tx_clean_complete),
> > +
> > +	TP_ARGS(napi, q, budget, budget_per_ring, work_done, tx_work_done,
> > +		clean_complete, tx_clean_complete),
> > +
> > +	TP_STRUCT__entry(
> > +		__field(int, budget)
> > +		__field(int, budget_per_ring)
> > +		__field(int, work_done)
> > +		__field(int, tx_work_done)
> > +		__field(int, clean_complete)
> > +		__field(int, tx_clean_complete)
> > +		__field(int, irq_num)
> > +		__field(int, curr_cpu)
> > +		__string(qname, q->name)
> > +		__string(dev_name, napi->dev ? napi->dev->name : NO_DEV)
> > +		__bitmask(irq_affinity,	nr_cpumask_bits)
> > +	),
> > +
> > +	TP_fast_assign(
> > +		__entry->budget = budget;
> > +		__entry->budget_per_ring = budget_per_ring;
> > +		__entry->work_done = work_done;
> 
> What if rx clean routines failed to do allocation of new rx bufs? then
> this would be misinterpreted. maybe we should change the API to
> 
> static bool
> i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget,
> 		  unsigned int *processed_pkts);
> 
> so you would return failure and at the end do
> 	*processed_pkts = total_rx_packets;

I reworked i40e_clean_rx_irq and i40e_clean_rx_irq_zc to do what you've
described -- this is also how I ended up approaching counting TX cleaned.

> 
> then also i would change the naming of tracepoint entry. I'm not a native
> english speaker but having 'done' within the variable name suggests to me
> that it is rather a boolean. what about something like 'rx_cleaned_pkts'
> instead?

Sure, I've changed the trace prototype, struct, and print statement to use
"rx_cleaned" and "tx_cleaned" instead of "work_done" and "tx_work_done".

>
> Generally I think this is useful, personally I was in need of tracing the
> next_to_clean and next_to_use ring indexes a lot, but that is probably out
> of the scope in here.

Yea, I've used those for debugging other things, as well - they are quite
useful, but I agree... I think that's out of scope for this set :)

Thank you very much for all your detailed feedback. Hopefully the v2 will
be a bit closer.

> > +		__entry->tx_work_done = tx_work_done;
> > +		__entry->clean_complete = clean_complete;
> > +		__entry->tx_clean_complete = tx_clean_complete;
> > +		__entry->irq_num = q->irq_num;
> > +		__entry->curr_cpu = get_cpu();
> > +		__assign_str(qname, q->name);
> > +		__assign_str(dev_name, napi->dev ? napi->dev->name : NO_DEV);
> > +		__assign_bitmask(irq_affinity, cpumask_bits(&q->affinity_mask),
> > +				 nr_cpumask_bits);
> > +	),
> > +
> > +	TP_printk("i40e_napi_poll on dev %s q %s irq %d irq_mask %s curr_cpu %d "
> > +		  "budget %d bpr %d work_done %d tx_work_done %d "
> > +		  "clean_complete %d tx_clean_complete %d",
> > +		__get_str(dev_name), __get_str(qname), __entry->irq_num,
> > +		__get_bitmask(irq_affinity), __entry->curr_cpu, __entry->budget,
> > +		__entry->budget_per_ring, __entry->work_done,
> > +		__entry->tx_work_done,
> > +		__entry->clean_complete, __entry->tx_clean_complete)
> > +);
> > +
> >  /* Events related to a vsi & ring */
> >  DECLARE_EVENT_CLASS(
> >  	i40e_tx_template,
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> > index ed88309..8b72f1b 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> > @@ -2743,6 +2743,9 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
> >  			clean_complete = false;
> >  	}
> >  
> > +	trace_i40e_napi_poll(napi, q_vector, budget, budget_per_ring, work_done, tx_wd,
> > +			     clean_complete, tx_clean_complete);
> > +
> >  	/* If work not completed, return budget and polling will return */
> >  	if (!clean_complete || !tx_clean_complete) {
> >  		int cpu_id = smp_processor_id();
> > -- 
> > 2.7.4
> > 
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [next-queue 1/3] i40e: Store the irq number in i40e_q_vector
  2022-10-05 17:00       ` [Intel-wired-lan] " Joe Damato
@ 2022-10-05 18:25         ` Jesse Brandeburg
  -1 siblings, 0 replies; 34+ messages in thread
From: Jesse Brandeburg @ 2022-10-05 18:25 UTC (permalink / raw)
  To: Joe Damato, Maciej Fijalkowski
  Cc: intel-wired-lan, netdev, kuba, davem, anthony.l.nguyen

On 10/5/2022 10:00 AM, Joe Damato wrote:
> On Wed, Oct 05, 2022 at 12:29:24PM +0200, Maciej Fijalkowski wrote:
>> On Wed, Oct 05, 2022 at 01:31:41AM -0700, Joe Damato wrote:
>>> Make it easy to figure out the IRQ number for a particular i40e_q_vector by
>>> storing the assigned IRQ in the structure itself.
>>>
>>> Signed-off-by: Joe Damato <jdamato@fastly.com>
>>> ---
>>>   drivers/net/ethernet/intel/i40e/i40e.h      | 1 +
>>>   drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
>>>   2 files changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
>>> index 9926c4e..8e1f395 100644
>>> --- a/drivers/net/ethernet/intel/i40e/i40e.h
>>> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
>>> @@ -992,6 +992,7 @@ struct i40e_q_vector {
>>>   	struct rcu_head rcu;	/* to avoid race with update stats on free */
>>>   	char name[I40E_INT_NAME_STR_LEN];
>>>   	bool arm_wb_state;
>>> +	int irq_num;		/* IRQ assigned to this q_vector */
>>
>> This struct looks like a mess in terms of members order. Can you check
>> with pahole how your patch affects the layout of it? Maybe while at it you
>> could pack it in a better way?
> 
> OK, sure. I used pahole and asked it to reorganize the struct members,
> which saves 24 bytes.

Hi Joe, thanks for your patches,

Saving 24 bytes is admirable, but these structures are generally 
optimized in access pattern order (most used at the top) and not so much 
for "packing efficiency" especially since it has that alignment 
directive at the bottom which causes each struct to start at it's own 
cacheline anyway.


> 
> I'll update this commit to include the following reorganization in the v2 of
> this set:
> 
> $ pahole -R -C i40e_q_vector i40e.ko
> 
> struct i40e_q_vector {
> 	struct i40e_vsi *          vsi;                  /*     0     8 */
> 	u16                        v_idx;                /*     8     2 */
> 	u16                        reg_idx;              /*    10     2 */
> 	u8                         num_ringpairs;        /*    12     1 */
> 	u8                         itr_countdown;        /*    13     1 */
> 	bool                       arm_wb_state;         /*    14     1 */
> 
> 	/* XXX 1 byte hole, try to pack */
> 
> 	struct napi_struct         napi;                 /*    16   400 */
> 	/* --- cacheline 6 boundary (384 bytes) was 32 bytes ago --- */
> 	struct i40e_ring_container rx;                   /*   416    32 */
> 	/* --- cacheline 7 boundary (448 bytes) --- */
> 	struct i40e_ring_container tx;                   /*   448    32 */
> 	cpumask_t                  affinity_mask;        /*   480    24 */
> 	struct irq_affinity_notify affinity_notify;      /*   504    56 */
> 	/* --- cacheline 8 boundary (512 bytes) was 48 bytes ago --- */
> 	struct callback_head       rcu;                  /*   560    16 */
> 	/* --- cacheline 9 boundary (576 bytes) --- */
> 	char                       name[32];             /*   576    32 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	int                        irq_num;              /*   612     4 */

The right spot for this debug item is at the end of the struct, so that 
part is good.

> 
> 	/* size: 616, cachelines: 10, members: 14 */
> 	/* sum members: 611, holes: 2, sum holes: 5 */
> 	/* last cacheline: 40 bytes */
> };   /* saved 24 bytes! */

I'd prefer it if you don't do two things at once in a single patch (add 
members / reorganize).

I know Maciej said this is a mess and I kind of agree with him, but I'm 
not sure it's a priority for your patch set to fix it now, especially 
since you're trying to add a debugging assist, and not performance 
tuning the code.

If you're really wanting to reorganize these structs I'd prefer a bit 
more diligent effort to prove no inadvertent side effects (like maybe by 
turning up the interrupt rate and looking at perf data while receiving 
512 byte packets. The rate should remain the same (or better) and the 
number of cache misses on these structs should remain roughly the same. 
Maybe a seperate patch series?

Jesse

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [next-queue 1/3] i40e: Store the irq number in i40e_q_vector
@ 2022-10-05 18:25         ` Jesse Brandeburg
  0 siblings, 0 replies; 34+ messages in thread
From: Jesse Brandeburg @ 2022-10-05 18:25 UTC (permalink / raw)
  To: Joe Damato, Maciej Fijalkowski; +Cc: netdev, intel-wired-lan, davem, kuba

On 10/5/2022 10:00 AM, Joe Damato wrote:
> On Wed, Oct 05, 2022 at 12:29:24PM +0200, Maciej Fijalkowski wrote:
>> On Wed, Oct 05, 2022 at 01:31:41AM -0700, Joe Damato wrote:
>>> Make it easy to figure out the IRQ number for a particular i40e_q_vector by
>>> storing the assigned IRQ in the structure itself.
>>>
>>> Signed-off-by: Joe Damato <jdamato@fastly.com>
>>> ---
>>>   drivers/net/ethernet/intel/i40e/i40e.h      | 1 +
>>>   drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
>>>   2 files changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
>>> index 9926c4e..8e1f395 100644
>>> --- a/drivers/net/ethernet/intel/i40e/i40e.h
>>> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
>>> @@ -992,6 +992,7 @@ struct i40e_q_vector {
>>>   	struct rcu_head rcu;	/* to avoid race with update stats on free */
>>>   	char name[I40E_INT_NAME_STR_LEN];
>>>   	bool arm_wb_state;
>>> +	int irq_num;		/* IRQ assigned to this q_vector */
>>
>> This struct looks like a mess in terms of members order. Can you check
>> with pahole how your patch affects the layout of it? Maybe while at it you
>> could pack it in a better way?
> 
> OK, sure. I used pahole and asked it to reorganize the struct members,
> which saves 24 bytes.

Hi Joe, thanks for your patches,

Saving 24 bytes is admirable, but these structures are generally 
optimized in access pattern order (most used at the top) and not so much 
for "packing efficiency" especially since it has that alignment 
directive at the bottom which causes each struct to start at it's own 
cacheline anyway.


> 
> I'll update this commit to include the following reorganization in the v2 of
> this set:
> 
> $ pahole -R -C i40e_q_vector i40e.ko
> 
> struct i40e_q_vector {
> 	struct i40e_vsi *          vsi;                  /*     0     8 */
> 	u16                        v_idx;                /*     8     2 */
> 	u16                        reg_idx;              /*    10     2 */
> 	u8                         num_ringpairs;        /*    12     1 */
> 	u8                         itr_countdown;        /*    13     1 */
> 	bool                       arm_wb_state;         /*    14     1 */
> 
> 	/* XXX 1 byte hole, try to pack */
> 
> 	struct napi_struct         napi;                 /*    16   400 */
> 	/* --- cacheline 6 boundary (384 bytes) was 32 bytes ago --- */
> 	struct i40e_ring_container rx;                   /*   416    32 */
> 	/* --- cacheline 7 boundary (448 bytes) --- */
> 	struct i40e_ring_container tx;                   /*   448    32 */
> 	cpumask_t                  affinity_mask;        /*   480    24 */
> 	struct irq_affinity_notify affinity_notify;      /*   504    56 */
> 	/* --- cacheline 8 boundary (512 bytes) was 48 bytes ago --- */
> 	struct callback_head       rcu;                  /*   560    16 */
> 	/* --- cacheline 9 boundary (576 bytes) --- */
> 	char                       name[32];             /*   576    32 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	int                        irq_num;              /*   612     4 */

The right spot for this debug item is at the end of the struct, so that 
part is good.

> 
> 	/* size: 616, cachelines: 10, members: 14 */
> 	/* sum members: 611, holes: 2, sum holes: 5 */
> 	/* last cacheline: 40 bytes */
> };   /* saved 24 bytes! */

I'd prefer it if you don't do two things at once in a single patch (add 
members / reorganize).

I know Maciej said this is a mess and I kind of agree with him, but I'm 
not sure it's a priority for your patch set to fix it now, especially 
since you're trying to add a debugging assist, and not performance 
tuning the code.

If you're really wanting to reorganize these structs I'd prefer a bit 
more diligent effort to prove no inadvertent side effects (like maybe by 
turning up the interrupt rate and looking at perf data while receiving 
512 byte packets. The rate should remain the same (or better) and the 
number of cache misses on these structs should remain roughly the same. 
Maybe a seperate patch series?

Jesse
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [next-queue 2/3] i40e: i40e_clean_tx_irq returns work done
  2022-10-05 17:50       ` [Intel-wired-lan] " Joe Damato
@ 2022-10-05 18:33         ` Jesse Brandeburg
  -1 siblings, 0 replies; 34+ messages in thread
From: Jesse Brandeburg @ 2022-10-05 18:33 UTC (permalink / raw)
  To: Joe Damato, Maciej Fijalkowski
  Cc: intel-wired-lan, netdev, kuba, davem, anthony.l.nguyen

On 10/5/2022 10:50 AM, Joe Damato wrote:
> On Wed, Oct 05, 2022 at 12:46:31PM +0200, Maciej Fijalkowski wrote:
>> On Wed, Oct 05, 2022 at 01:31:42AM -0700, Joe Damato wrote:
>>> Adjust i40e_clean_tx_irq to return the actual number of packets cleaned
>>> and adjust the logic in i40e_napi_poll to check this value.

it's fine to return the number cleaned but let's keep that data and 
changes to itself instead of changing the flow of the routine.


>>>
>>> Signed-off-by: Joe Damato <jdamato@fastly.com>
>>> ---
>>>   drivers/net/ethernet/intel/i40e/i40e_txrx.c | 24 +++++++++++++-----------
>>>   drivers/net/ethernet/intel/i40e/i40e_xsk.c  | 12 ++++++------
>>>   drivers/net/ethernet/intel/i40e/i40e_xsk.h  |  2 +-
>>>   3 files changed, 20 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>>> index b97c95f..ed88309 100644
>>> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>>> @@ -924,10 +924,10 @@ void i40e_detect_recover_hung(struct i40e_vsi *vsi)
>>>    * @tx_ring: Tx ring to clean
>>>    * @napi_budget: Used to determine if we are in netpoll
>>>    *
>>> - * Returns true if there's any budget left (e.g. the clean is finished)
>>> + * Returns the number of packets cleaned
>>>    **/
>>> -static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
>>> -			      struct i40e_ring *tx_ring, int napi_budget)
>>> +static int i40e_clean_tx_irq(struct i40e_vsi *vsi,
>>> +			     struct i40e_ring *tx_ring, int napi_budget)
>>>   {
>>>   	int i = tx_ring->next_to_clean;
>>>   	struct i40e_tx_buffer *tx_buf;
>>> @@ -1026,7 +1026,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
>>>   	i40e_arm_wb(tx_ring, vsi, budget);
>>>   
>>>   	if (ring_is_xdp(tx_ring))
>>> -		return !!budget;
>>> +		return total_packets;
>>>   
>>>   	/* notify netdev of completed buffers */
>>>   	netdev_tx_completed_queue(txring_txq(tx_ring),
>>> @@ -1048,7 +1048,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
>>>   		}
>>>   	}
>>>   
>>> -	return !!budget;
>>> +	return total_packets;
>>>   }
>>>   
>>>   /**
>>> @@ -2689,10 +2689,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
>>>   			       container_of(napi, struct i40e_q_vector, napi);
>>>   	struct i40e_vsi *vsi = q_vector->vsi;
>>>   	struct i40e_ring *ring;
>>> +	bool tx_clean_complete = true;
>>>   	bool clean_complete = true;
>>>   	bool arm_wb = false;
>>>   	int budget_per_ring;
>>>   	int work_done = 0;
>>> +	int tx_wd = 0;
>>>   
>>>   	if (test_bit(__I40E_VSI_DOWN, vsi->state)) {
>>>   		napi_complete(napi);
>>> @@ -2703,12 +2705,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
>>>   	 * budget and be more aggressive about cleaning up the Tx descriptors.
>>>   	 */
>>>   	i40e_for_each_ring(ring, q_vector->tx) {
>>> -		bool wd = ring->xsk_pool ?
>>> -			  i40e_clean_xdp_tx_irq(vsi, ring) :
>>> -			  i40e_clean_tx_irq(vsi, ring, budget);
>>> +		tx_wd = ring->xsk_pool ?
>>> +			i40e_clean_xdp_tx_irq(vsi, ring) :
>>> +			i40e_clean_tx_irq(vsi, ring, budget);
>>>   
>>> -		if (!wd) {
>>> -			clean_complete = false;
>>> +		if (tx_wd >= budget) {
>>> +			tx_clean_complete = false;
>>
>> This will break for AF_XDP Tx ZC. AF_XDP Tx ZC in intel drivers ignores
>> budget given by NAPI. If you look at i40e_xmit_zc():
>>
>> func def:
>> static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
>>
>> callsite:
>> 	return i40e_xmit_zc(tx_ring, I40E_DESC_UNUSED(tx_ring));
>>
>> we give free ring space as a budget and with your change we would be
>> returning the amount of processed tx descriptors which you will be
>> comparing against NAPI budget (64, unless you have busy poll enabled with
>> a different batch size). Say you start with empty ring and your HW rings
>> are sized to 1k but there was only 512 AF_XDP descriptors ready for Tx.
>> You produced all of them successfully to ring and you return 512 up to
>> i40e_napi_poll.
> 
> Good point, my bad.
> 
> I've reworked this for the v2 and have given i40e_clean_tx_irq,
> and i40e_clean_xdp_tx_irq an out parameter which will record the number
> TXes cleaned.
> 
> I tweaked i40e_xmit_zc to return the number of packets (nb_pkts) and moved
> the boolean to check if that's under the "budget"
> (I40E_DESC_UNUSED(tx_ring)) into i40e_clean_xdp_tx_irq.
> 
> I think that might solve the issues you've described.

Please don't change the flow of this function, transmit clean ups are so 
cheap that we don't bother counting them or limiting them beyond a 
maximum (so they don't clean forever)

Basically transmits should not be counted when exiting NAPI, besides 
that we did "at least one". The only thing that matters to the budget is 
that we "finished" transmit cleanup or not, which would make sure we 
rescheduled napi if we weren't finished cleaning (for instance on a 8160 
entry tx ring) transmits.

I'd much rather you kept this series to a simple return count of tx 
cleaned in "out" as you've said you'd do in v2, and then use that data 
*only* in the context of the new trace event.

That way you're not changing the flow and introducing tough to debug 
issues in the hot path.

Also, mixing new features and introducing refactors makes it very hard 
to unwind if something goes wrong in the future.

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [next-queue 2/3] i40e: i40e_clean_tx_irq returns work done
@ 2022-10-05 18:33         ` Jesse Brandeburg
  0 siblings, 0 replies; 34+ messages in thread
From: Jesse Brandeburg @ 2022-10-05 18:33 UTC (permalink / raw)
  To: Joe Damato, Maciej Fijalkowski; +Cc: netdev, intel-wired-lan, davem, kuba

On 10/5/2022 10:50 AM, Joe Damato wrote:
> On Wed, Oct 05, 2022 at 12:46:31PM +0200, Maciej Fijalkowski wrote:
>> On Wed, Oct 05, 2022 at 01:31:42AM -0700, Joe Damato wrote:
>>> Adjust i40e_clean_tx_irq to return the actual number of packets cleaned
>>> and adjust the logic in i40e_napi_poll to check this value.

it's fine to return the number cleaned but let's keep that data and 
changes to itself instead of changing the flow of the routine.


>>>
>>> Signed-off-by: Joe Damato <jdamato@fastly.com>
>>> ---
>>>   drivers/net/ethernet/intel/i40e/i40e_txrx.c | 24 +++++++++++++-----------
>>>   drivers/net/ethernet/intel/i40e/i40e_xsk.c  | 12 ++++++------
>>>   drivers/net/ethernet/intel/i40e/i40e_xsk.h  |  2 +-
>>>   3 files changed, 20 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>>> index b97c95f..ed88309 100644
>>> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>>> @@ -924,10 +924,10 @@ void i40e_detect_recover_hung(struct i40e_vsi *vsi)
>>>    * @tx_ring: Tx ring to clean
>>>    * @napi_budget: Used to determine if we are in netpoll
>>>    *
>>> - * Returns true if there's any budget left (e.g. the clean is finished)
>>> + * Returns the number of packets cleaned
>>>    **/
>>> -static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
>>> -			      struct i40e_ring *tx_ring, int napi_budget)
>>> +static int i40e_clean_tx_irq(struct i40e_vsi *vsi,
>>> +			     struct i40e_ring *tx_ring, int napi_budget)
>>>   {
>>>   	int i = tx_ring->next_to_clean;
>>>   	struct i40e_tx_buffer *tx_buf;
>>> @@ -1026,7 +1026,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
>>>   	i40e_arm_wb(tx_ring, vsi, budget);
>>>   
>>>   	if (ring_is_xdp(tx_ring))
>>> -		return !!budget;
>>> +		return total_packets;
>>>   
>>>   	/* notify netdev of completed buffers */
>>>   	netdev_tx_completed_queue(txring_txq(tx_ring),
>>> @@ -1048,7 +1048,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
>>>   		}
>>>   	}
>>>   
>>> -	return !!budget;
>>> +	return total_packets;
>>>   }
>>>   
>>>   /**
>>> @@ -2689,10 +2689,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
>>>   			       container_of(napi, struct i40e_q_vector, napi);
>>>   	struct i40e_vsi *vsi = q_vector->vsi;
>>>   	struct i40e_ring *ring;
>>> +	bool tx_clean_complete = true;
>>>   	bool clean_complete = true;
>>>   	bool arm_wb = false;
>>>   	int budget_per_ring;
>>>   	int work_done = 0;
>>> +	int tx_wd = 0;
>>>   
>>>   	if (test_bit(__I40E_VSI_DOWN, vsi->state)) {
>>>   		napi_complete(napi);
>>> @@ -2703,12 +2705,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
>>>   	 * budget and be more aggressive about cleaning up the Tx descriptors.
>>>   	 */
>>>   	i40e_for_each_ring(ring, q_vector->tx) {
>>> -		bool wd = ring->xsk_pool ?
>>> -			  i40e_clean_xdp_tx_irq(vsi, ring) :
>>> -			  i40e_clean_tx_irq(vsi, ring, budget);
>>> +		tx_wd = ring->xsk_pool ?
>>> +			i40e_clean_xdp_tx_irq(vsi, ring) :
>>> +			i40e_clean_tx_irq(vsi, ring, budget);
>>>   
>>> -		if (!wd) {
>>> -			clean_complete = false;
>>> +		if (tx_wd >= budget) {
>>> +			tx_clean_complete = false;
>>
>> This will break for AF_XDP Tx ZC. AF_XDP Tx ZC in intel drivers ignores
>> budget given by NAPI. If you look at i40e_xmit_zc():
>>
>> func def:
>> static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
>>
>> callsite:
>> 	return i40e_xmit_zc(tx_ring, I40E_DESC_UNUSED(tx_ring));
>>
>> we give free ring space as a budget and with your change we would be
>> returning the amount of processed tx descriptors which you will be
>> comparing against NAPI budget (64, unless you have busy poll enabled with
>> a different batch size). Say you start with empty ring and your HW rings
>> are sized to 1k but there was only 512 AF_XDP descriptors ready for Tx.
>> You produced all of them successfully to ring and you return 512 up to
>> i40e_napi_poll.
> 
> Good point, my bad.
> 
> I've reworked this for the v2 and have given i40e_clean_tx_irq,
> and i40e_clean_xdp_tx_irq an out parameter which will record the number
> TXes cleaned.
> 
> I tweaked i40e_xmit_zc to return the number of packets (nb_pkts) and moved
> the boolean to check if that's under the "budget"
> (I40E_DESC_UNUSED(tx_ring)) into i40e_clean_xdp_tx_irq.
> 
> I think that might solve the issues you've described.

Please don't change the flow of this function, transmit clean ups are so 
cheap that we don't bother counting them or limiting them beyond a 
maximum (so they don't clean forever)

Basically transmits should not be counted when exiting NAPI, besides 
that we did "at least one". The only thing that matters to the budget is 
that we "finished" transmit cleanup or not, which would make sure we 
rescheduled napi if we weren't finished cleaning (for instance on a 8160 
entry tx ring) transmits.

I'd much rather you kept this series to a simple return count of tx 
cleaned in "out" as you've said you'd do in v2, and then use that data 
*only* in the context of the new trace event.

That way you're not changing the flow and introducing tough to debug 
issues in the hot path.

Also, mixing new features and introducing refactors makes it very hard 
to unwind if something goes wrong in the future.

Thanks,
Jesse
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [next-queue 1/3] i40e: Store the irq number in i40e_q_vector
  2022-10-05 18:25         ` [Intel-wired-lan] " Jesse Brandeburg
@ 2022-10-05 18:40           ` Joe Damato
  -1 siblings, 0 replies; 34+ messages in thread
From: Joe Damato @ 2022-10-05 18:40 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: netdev, intel-wired-lan, kuba, davem

On Wed, Oct 05, 2022 at 11:25:32AM -0700, Jesse Brandeburg wrote:
> On 10/5/2022 10:00 AM, Joe Damato wrote:
> >On Wed, Oct 05, 2022 at 12:29:24PM +0200, Maciej Fijalkowski wrote:
> >>On Wed, Oct 05, 2022 at 01:31:41AM -0700, Joe Damato wrote:
> >>>Make it easy to figure out the IRQ number for a particular i40e_q_vector by
> >>>storing the assigned IRQ in the structure itself.
> >>>
> >>>Signed-off-by: Joe Damato <jdamato@fastly.com>
> >>>---
> >>>  drivers/net/ethernet/intel/i40e/i40e.h      | 1 +
> >>>  drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
> >>>  2 files changed, 2 insertions(+)
> >>>
> >>>diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
> >>>index 9926c4e..8e1f395 100644
> >>>--- a/drivers/net/ethernet/intel/i40e/i40e.h
> >>>+++ b/drivers/net/ethernet/intel/i40e/i40e.h
> >>>@@ -992,6 +992,7 @@ struct i40e_q_vector {
> >>>  	struct rcu_head rcu;	/* to avoid race with update stats on free */
> >>>  	char name[I40E_INT_NAME_STR_LEN];
> >>>  	bool arm_wb_state;
> >>>+	int irq_num;		/* IRQ assigned to this q_vector */
> >>
> >>This struct looks like a mess in terms of members order. Can you check
> >>with pahole how your patch affects the layout of it? Maybe while at it you
> >>could pack it in a better way?
> >
> >OK, sure. I used pahole and asked it to reorganize the struct members,
> >which saves 24 bytes.
> 
> Hi Joe, thanks for your patches,
> 
> Saving 24 bytes is admirable, but these structures are generally optimized
> in access pattern order (most used at the top) and not so much for "packing
> efficiency" especially since it has that alignment directive at the bottom
> which causes each struct to start at it's own cacheline anyway.
> 
> 
> >
> >I'll update this commit to include the following reorganization in the v2 of
> >this set:
> >
> >$ pahole -R -C i40e_q_vector i40e.ko
> >
> >struct i40e_q_vector {
> >	struct i40e_vsi *          vsi;                  /*     0     8 */
> >	u16                        v_idx;                /*     8     2 */
> >	u16                        reg_idx;              /*    10     2 */
> >	u8                         num_ringpairs;        /*    12     1 */
> >	u8                         itr_countdown;        /*    13     1 */
> >	bool                       arm_wb_state;         /*    14     1 */
> >
> >	/* XXX 1 byte hole, try to pack */
> >
> >	struct napi_struct         napi;                 /*    16   400 */
> >	/* --- cacheline 6 boundary (384 bytes) was 32 bytes ago --- */
> >	struct i40e_ring_container rx;                   /*   416    32 */
> >	/* --- cacheline 7 boundary (448 bytes) --- */
> >	struct i40e_ring_container tx;                   /*   448    32 */
> >	cpumask_t                  affinity_mask;        /*   480    24 */
> >	struct irq_affinity_notify affinity_notify;      /*   504    56 */
> >	/* --- cacheline 8 boundary (512 bytes) was 48 bytes ago --- */
> >	struct callback_head       rcu;                  /*   560    16 */
> >	/* --- cacheline 9 boundary (576 bytes) --- */
> >	char                       name[32];             /*   576    32 */
> >
> >	/* XXX 4 bytes hole, try to pack */
> >
> >	int                        irq_num;              /*   612     4 */
> 
> The right spot for this debug item is at the end of the struct, so that part
> is good.
> 
> >
> >	/* size: 616, cachelines: 10, members: 14 */
> >	/* sum members: 611, holes: 2, sum holes: 5 */
> >	/* last cacheline: 40 bytes */
> >};   /* saved 24 bytes! */
> 
> I'd prefer it if you don't do two things at once in a single patch (add
> members / reorganize).
> 
> I know Maciej said this is a mess and I kind of agree with him, but I'm not
> sure it's a priority for your patch set to fix it now, especially since
> you're trying to add a debugging assist, and not performance tuning the
> code.
> 
> If you're really wanting to reorganize these structs I'd prefer a bit more
> diligent effort to prove no inadvertent side effects (like maybe by turning
> up the interrupt rate and looking at perf data while receiving 512 byte
> packets. The rate should remain the same (or better) and the number of cache
> misses on these structs should remain roughly the same. Maybe a seperate
> patch series?

I honestly did think that reorganizing the struct was probably out of scope
of this change, so if you agree so I'll drop this change from the v2 and
keep the original which adds irq_num to the end of the struct.
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [next-queue 1/3] i40e: Store the irq number in i40e_q_vector
@ 2022-10-05 18:40           ` Joe Damato
  0 siblings, 0 replies; 34+ messages in thread
From: Joe Damato @ 2022-10-05 18:40 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: Maciej Fijalkowski, intel-wired-lan, netdev, kuba, davem,
	anthony.l.nguyen

On Wed, Oct 05, 2022 at 11:25:32AM -0700, Jesse Brandeburg wrote:
> On 10/5/2022 10:00 AM, Joe Damato wrote:
> >On Wed, Oct 05, 2022 at 12:29:24PM +0200, Maciej Fijalkowski wrote:
> >>On Wed, Oct 05, 2022 at 01:31:41AM -0700, Joe Damato wrote:
> >>>Make it easy to figure out the IRQ number for a particular i40e_q_vector by
> >>>storing the assigned IRQ in the structure itself.
> >>>
> >>>Signed-off-by: Joe Damato <jdamato@fastly.com>
> >>>---
> >>>  drivers/net/ethernet/intel/i40e/i40e.h      | 1 +
> >>>  drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
> >>>  2 files changed, 2 insertions(+)
> >>>
> >>>diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
> >>>index 9926c4e..8e1f395 100644
> >>>--- a/drivers/net/ethernet/intel/i40e/i40e.h
> >>>+++ b/drivers/net/ethernet/intel/i40e/i40e.h
> >>>@@ -992,6 +992,7 @@ struct i40e_q_vector {
> >>>  	struct rcu_head rcu;	/* to avoid race with update stats on free */
> >>>  	char name[I40E_INT_NAME_STR_LEN];
> >>>  	bool arm_wb_state;
> >>>+	int irq_num;		/* IRQ assigned to this q_vector */
> >>
> >>This struct looks like a mess in terms of members order. Can you check
> >>with pahole how your patch affects the layout of it? Maybe while at it you
> >>could pack it in a better way?
> >
> >OK, sure. I used pahole and asked it to reorganize the struct members,
> >which saves 24 bytes.
> 
> Hi Joe, thanks for your patches,
> 
> Saving 24 bytes is admirable, but these structures are generally optimized
> in access pattern order (most used at the top) and not so much for "packing
> efficiency" especially since it has that alignment directive at the bottom
> which causes each struct to start at it's own cacheline anyway.
> 
> 
> >
> >I'll update this commit to include the following reorganization in the v2 of
> >this set:
> >
> >$ pahole -R -C i40e_q_vector i40e.ko
> >
> >struct i40e_q_vector {
> >	struct i40e_vsi *          vsi;                  /*     0     8 */
> >	u16                        v_idx;                /*     8     2 */
> >	u16                        reg_idx;              /*    10     2 */
> >	u8                         num_ringpairs;        /*    12     1 */
> >	u8                         itr_countdown;        /*    13     1 */
> >	bool                       arm_wb_state;         /*    14     1 */
> >
> >	/* XXX 1 byte hole, try to pack */
> >
> >	struct napi_struct         napi;                 /*    16   400 */
> >	/* --- cacheline 6 boundary (384 bytes) was 32 bytes ago --- */
> >	struct i40e_ring_container rx;                   /*   416    32 */
> >	/* --- cacheline 7 boundary (448 bytes) --- */
> >	struct i40e_ring_container tx;                   /*   448    32 */
> >	cpumask_t                  affinity_mask;        /*   480    24 */
> >	struct irq_affinity_notify affinity_notify;      /*   504    56 */
> >	/* --- cacheline 8 boundary (512 bytes) was 48 bytes ago --- */
> >	struct callback_head       rcu;                  /*   560    16 */
> >	/* --- cacheline 9 boundary (576 bytes) --- */
> >	char                       name[32];             /*   576    32 */
> >
> >	/* XXX 4 bytes hole, try to pack */
> >
> >	int                        irq_num;              /*   612     4 */
> 
> The right spot for this debug item is at the end of the struct, so that part
> is good.
> 
> >
> >	/* size: 616, cachelines: 10, members: 14 */
> >	/* sum members: 611, holes: 2, sum holes: 5 */
> >	/* last cacheline: 40 bytes */
> >};   /* saved 24 bytes! */
> 
> I'd prefer it if you don't do two things at once in a single patch (add
> members / reorganize).
> 
> I know Maciej said this is a mess and I kind of agree with him, but I'm not
> sure it's a priority for your patch set to fix it now, especially since
> you're trying to add a debugging assist, and not performance tuning the
> code.
> 
> If you're really wanting to reorganize these structs I'd prefer a bit more
> diligent effort to prove no inadvertent side effects (like maybe by turning
> up the interrupt rate and looking at perf data while receiving 512 byte
> packets. The rate should remain the same (or better) and the number of cache
> misses on these structs should remain roughly the same. Maybe a seperate
> patch series?

I honestly did think that reorganizing the struct was probably out of scope
of this change, so if you agree so I'll drop this change from the v2 and
keep the original which adds irq_num to the end of the struct.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [next-queue 2/3] i40e: i40e_clean_tx_irq returns work done
  2022-10-05 18:33         ` [Intel-wired-lan] " Jesse Brandeburg
@ 2022-10-05 18:47           ` Joe Damato
  -1 siblings, 0 replies; 34+ messages in thread
From: Joe Damato @ 2022-10-05 18:47 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: netdev, intel-wired-lan, kuba, davem

On Wed, Oct 05, 2022 at 11:33:23AM -0700, Jesse Brandeburg wrote:
> On 10/5/2022 10:50 AM, Joe Damato wrote:
> >On Wed, Oct 05, 2022 at 12:46:31PM +0200, Maciej Fijalkowski wrote:
> >>On Wed, Oct 05, 2022 at 01:31:42AM -0700, Joe Damato wrote:
> >>>Adjust i40e_clean_tx_irq to return the actual number of packets cleaned
> >>>and adjust the logic in i40e_napi_poll to check this value.
> 
> it's fine to return the number cleaned but let's keep that data and changes
> to itself instead of changing the flow of the routine.
> 
> 
> >>>
> >>>Signed-off-by: Joe Damato <jdamato@fastly.com>
> >>>---
> >>>  drivers/net/ethernet/intel/i40e/i40e_txrx.c | 24 +++++++++++++-----------
> >>>  drivers/net/ethernet/intel/i40e/i40e_xsk.c  | 12 ++++++------
> >>>  drivers/net/ethernet/intel/i40e/i40e_xsk.h  |  2 +-
> >>>  3 files changed, 20 insertions(+), 18 deletions(-)
> >>>
> >>>diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> >>>index b97c95f..ed88309 100644
> >>>--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> >>>+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> >>>@@ -924,10 +924,10 @@ void i40e_detect_recover_hung(struct i40e_vsi *vsi)
> >>>   * @tx_ring: Tx ring to clean
> >>>   * @napi_budget: Used to determine if we are in netpoll
> >>>   *
> >>>- * Returns true if there's any budget left (e.g. the clean is finished)
> >>>+ * Returns the number of packets cleaned
> >>>   **/
> >>>-static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
> >>>-			      struct i40e_ring *tx_ring, int napi_budget)
> >>>+static int i40e_clean_tx_irq(struct i40e_vsi *vsi,
> >>>+			     struct i40e_ring *tx_ring, int napi_budget)
> >>>  {
> >>>  	int i = tx_ring->next_to_clean;
> >>>  	struct i40e_tx_buffer *tx_buf;
> >>>@@ -1026,7 +1026,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
> >>>  	i40e_arm_wb(tx_ring, vsi, budget);
> >>>  	if (ring_is_xdp(tx_ring))
> >>>-		return !!budget;
> >>>+		return total_packets;
> >>>  	/* notify netdev of completed buffers */
> >>>  	netdev_tx_completed_queue(txring_txq(tx_ring),
> >>>@@ -1048,7 +1048,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
> >>>  		}
> >>>  	}
> >>>-	return !!budget;
> >>>+	return total_packets;
> >>>  }
> >>>  /**
> >>>@@ -2689,10 +2689,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
> >>>  			       container_of(napi, struct i40e_q_vector, napi);
> >>>  	struct i40e_vsi *vsi = q_vector->vsi;
> >>>  	struct i40e_ring *ring;
> >>>+	bool tx_clean_complete = true;
> >>>  	bool clean_complete = true;
> >>>  	bool arm_wb = false;
> >>>  	int budget_per_ring;
> >>>  	int work_done = 0;
> >>>+	int tx_wd = 0;
> >>>  	if (test_bit(__I40E_VSI_DOWN, vsi->state)) {
> >>>  		napi_complete(napi);
> >>>@@ -2703,12 +2705,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
> >>>  	 * budget and be more aggressive about cleaning up the Tx descriptors.
> >>>  	 */
> >>>  	i40e_for_each_ring(ring, q_vector->tx) {
> >>>-		bool wd = ring->xsk_pool ?
> >>>-			  i40e_clean_xdp_tx_irq(vsi, ring) :
> >>>-			  i40e_clean_tx_irq(vsi, ring, budget);
> >>>+		tx_wd = ring->xsk_pool ?
> >>>+			i40e_clean_xdp_tx_irq(vsi, ring) :
> >>>+			i40e_clean_tx_irq(vsi, ring, budget);
> >>>-		if (!wd) {
> >>>-			clean_complete = false;
> >>>+		if (tx_wd >= budget) {
> >>>+			tx_clean_complete = false;
> >>
> >>This will break for AF_XDP Tx ZC. AF_XDP Tx ZC in intel drivers ignores
> >>budget given by NAPI. If you look at i40e_xmit_zc():
> >>
> >>func def:
> >>static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
> >>
> >>callsite:
> >>	return i40e_xmit_zc(tx_ring, I40E_DESC_UNUSED(tx_ring));
> >>
> >>we give free ring space as a budget and with your change we would be
> >>returning the amount of processed tx descriptors which you will be
> >>comparing against NAPI budget (64, unless you have busy poll enabled with
> >>a different batch size). Say you start with empty ring and your HW rings
> >>are sized to 1k but there was only 512 AF_XDP descriptors ready for Tx.
> >>You produced all of them successfully to ring and you return 512 up to
> >>i40e_napi_poll.
> >
> >Good point, my bad.
> >
> >I've reworked this for the v2 and have given i40e_clean_tx_irq,
> >and i40e_clean_xdp_tx_irq an out parameter which will record the number
> >TXes cleaned.
> >
> >I tweaked i40e_xmit_zc to return the number of packets (nb_pkts) and moved
> >the boolean to check if that's under the "budget"
> >(I40E_DESC_UNUSED(tx_ring)) into i40e_clean_xdp_tx_irq.
> >
> >I think that might solve the issues you've described.
> 
> Please don't change the flow of this function, transmit clean ups are so
> cheap that we don't bother counting them or limiting them beyond a maximum
> (so they don't clean forever)
> 
> Basically transmits should not be counted when exiting NAPI, besides that we
> did "at least one". The only thing that matters to the budget is that we
> "finished" transmit cleanup or not, which would make sure we rescheduled
> napi if we weren't finished cleaning (for instance on a 8160 entry tx ring)
> transmits.
> 
> I'd much rather you kept this series to a simple return count of tx cleaned
> in "out" as you've said you'd do in v2, and then use that data *only* in the
> context of the new trace event.
> 
> That way you're not changing the flow and introducing tough to debug issues
> in the hot path.

In the v2 I've been hacking on I've added out params to i40e_clean_tx_irq and
i40e_clean_xdp_tx_irq, but I avoided adding an out param in i40e_xmit_zc,
since lifting the boolean out seemed pretty straightforward.

I'll drop that though in favor of an out param in i40e_xmit_zc, as well, to
avoid changing the flow of the code.

Thanks for taking a look.
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [next-queue 2/3] i40e: i40e_clean_tx_irq returns work done
@ 2022-10-05 18:47           ` Joe Damato
  0 siblings, 0 replies; 34+ messages in thread
From: Joe Damato @ 2022-10-05 18:47 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: Maciej Fijalkowski, intel-wired-lan, netdev, kuba, davem,
	anthony.l.nguyen

On Wed, Oct 05, 2022 at 11:33:23AM -0700, Jesse Brandeburg wrote:
> On 10/5/2022 10:50 AM, Joe Damato wrote:
> >On Wed, Oct 05, 2022 at 12:46:31PM +0200, Maciej Fijalkowski wrote:
> >>On Wed, Oct 05, 2022 at 01:31:42AM -0700, Joe Damato wrote:
> >>>Adjust i40e_clean_tx_irq to return the actual number of packets cleaned
> >>>and adjust the logic in i40e_napi_poll to check this value.
> 
> it's fine to return the number cleaned but let's keep that data and changes
> to itself instead of changing the flow of the routine.
> 
> 
> >>>
> >>>Signed-off-by: Joe Damato <jdamato@fastly.com>
> >>>---
> >>>  drivers/net/ethernet/intel/i40e/i40e_txrx.c | 24 +++++++++++++-----------
> >>>  drivers/net/ethernet/intel/i40e/i40e_xsk.c  | 12 ++++++------
> >>>  drivers/net/ethernet/intel/i40e/i40e_xsk.h  |  2 +-
> >>>  3 files changed, 20 insertions(+), 18 deletions(-)
> >>>
> >>>diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> >>>index b97c95f..ed88309 100644
> >>>--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> >>>+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> >>>@@ -924,10 +924,10 @@ void i40e_detect_recover_hung(struct i40e_vsi *vsi)
> >>>   * @tx_ring: Tx ring to clean
> >>>   * @napi_budget: Used to determine if we are in netpoll
> >>>   *
> >>>- * Returns true if there's any budget left (e.g. the clean is finished)
> >>>+ * Returns the number of packets cleaned
> >>>   **/
> >>>-static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
> >>>-			      struct i40e_ring *tx_ring, int napi_budget)
> >>>+static int i40e_clean_tx_irq(struct i40e_vsi *vsi,
> >>>+			     struct i40e_ring *tx_ring, int napi_budget)
> >>>  {
> >>>  	int i = tx_ring->next_to_clean;
> >>>  	struct i40e_tx_buffer *tx_buf;
> >>>@@ -1026,7 +1026,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
> >>>  	i40e_arm_wb(tx_ring, vsi, budget);
> >>>  	if (ring_is_xdp(tx_ring))
> >>>-		return !!budget;
> >>>+		return total_packets;
> >>>  	/* notify netdev of completed buffers */
> >>>  	netdev_tx_completed_queue(txring_txq(tx_ring),
> >>>@@ -1048,7 +1048,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
> >>>  		}
> >>>  	}
> >>>-	return !!budget;
> >>>+	return total_packets;
> >>>  }
> >>>  /**
> >>>@@ -2689,10 +2689,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
> >>>  			       container_of(napi, struct i40e_q_vector, napi);
> >>>  	struct i40e_vsi *vsi = q_vector->vsi;
> >>>  	struct i40e_ring *ring;
> >>>+	bool tx_clean_complete = true;
> >>>  	bool clean_complete = true;
> >>>  	bool arm_wb = false;
> >>>  	int budget_per_ring;
> >>>  	int work_done = 0;
> >>>+	int tx_wd = 0;
> >>>  	if (test_bit(__I40E_VSI_DOWN, vsi->state)) {
> >>>  		napi_complete(napi);
> >>>@@ -2703,12 +2705,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
> >>>  	 * budget and be more aggressive about cleaning up the Tx descriptors.
> >>>  	 */
> >>>  	i40e_for_each_ring(ring, q_vector->tx) {
> >>>-		bool wd = ring->xsk_pool ?
> >>>-			  i40e_clean_xdp_tx_irq(vsi, ring) :
> >>>-			  i40e_clean_tx_irq(vsi, ring, budget);
> >>>+		tx_wd = ring->xsk_pool ?
> >>>+			i40e_clean_xdp_tx_irq(vsi, ring) :
> >>>+			i40e_clean_tx_irq(vsi, ring, budget);
> >>>-		if (!wd) {
> >>>-			clean_complete = false;
> >>>+		if (tx_wd >= budget) {
> >>>+			tx_clean_complete = false;
> >>
> >>This will break for AF_XDP Tx ZC. AF_XDP Tx ZC in intel drivers ignores
> >>budget given by NAPI. If you look at i40e_xmit_zc():
> >>
> >>func def:
> >>static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
> >>
> >>callsite:
> >>	return i40e_xmit_zc(tx_ring, I40E_DESC_UNUSED(tx_ring));
> >>
> >>we give free ring space as a budget and with your change we would be
> >>returning the amount of processed tx descriptors which you will be
> >>comparing against NAPI budget (64, unless you have busy poll enabled with
> >>a different batch size). Say you start with empty ring and your HW rings
> >>are sized to 1k but there was only 512 AF_XDP descriptors ready for Tx.
> >>You produced all of them successfully to ring and you return 512 up to
> >>i40e_napi_poll.
> >
> >Good point, my bad.
> >
> >I've reworked this for the v2 and have given i40e_clean_tx_irq,
> >and i40e_clean_xdp_tx_irq an out parameter which will record the number
> >TXes cleaned.
> >
> >I tweaked i40e_xmit_zc to return the number of packets (nb_pkts) and moved
> >the boolean to check if that's under the "budget"
> >(I40E_DESC_UNUSED(tx_ring)) into i40e_clean_xdp_tx_irq.
> >
> >I think that might solve the issues you've described.
> 
> Please don't change the flow of this function, transmit clean ups are so
> cheap that we don't bother counting them or limiting them beyond a maximum
> (so they don't clean forever)
> 
> Basically transmits should not be counted when exiting NAPI, besides that we
> did "at least one". The only thing that matters to the budget is that we
> "finished" transmit cleanup or not, which would make sure we rescheduled
> napi if we weren't finished cleaning (for instance on a 8160 entry tx ring)
> transmits.
> 
> I'd much rather you kept this series to a simple return count of tx cleaned
> in "out" as you've said you'd do in v2, and then use that data *only* in the
> context of the new trace event.
> 
> That way you're not changing the flow and introducing tough to debug issues
> in the hot path.

In the v2 I've been hacking on I've added out params to i40e_clean_tx_irq and
i40e_clean_xdp_tx_irq, but I avoided adding an out param in i40e_xmit_zc,
since lifting the boolean out seemed pretty straightforward.

I'll drop that though in favor of an out param in i40e_xmit_zc, as well, to
avoid changing the flow of the code.

Thanks for taking a look.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [next-queue 1/3] i40e: Store the irq number in i40e_q_vector
  2022-10-05 18:40           ` Joe Damato
@ 2022-10-05 19:37             ` Jesse Brandeburg
  -1 siblings, 0 replies; 34+ messages in thread
From: Jesse Brandeburg @ 2022-10-05 19:37 UTC (permalink / raw)
  To: Joe Damato
  Cc: Maciej Fijalkowski, intel-wired-lan, netdev, kuba, davem,
	anthony.l.nguyen

On 10/5/2022 11:40 AM, Joe Damato wrote:

>> If you're really wanting to reorganize these structs I'd prefer a bit more
>> diligent effort to prove no inadvertent side effects (like maybe by turning
>> up the interrupt rate and looking at perf data while receiving 512 byte
>> packets. The rate should remain the same (or better) and the number of cache
>> misses on these structs should remain roughly the same. Maybe a seperate
>> patch series?
> 
> I honestly did think that reorganizing the struct was probably out of scope
> of this change, so if you agree so I'll drop this change from the v2 and
> keep the original which adds irq_num to the end of the struct.

I agree, especially in these routines, doing simple, 
explainable/observable changes is best.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [next-queue 1/3] i40e: Store the irq number in i40e_q_vector
@ 2022-10-05 19:37             ` Jesse Brandeburg
  0 siblings, 0 replies; 34+ messages in thread
From: Jesse Brandeburg @ 2022-10-05 19:37 UTC (permalink / raw)
  To: Joe Damato; +Cc: netdev, intel-wired-lan, kuba, davem

On 10/5/2022 11:40 AM, Joe Damato wrote:

>> If you're really wanting to reorganize these structs I'd prefer a bit more
>> diligent effort to prove no inadvertent side effects (like maybe by turning
>> up the interrupt rate and looking at perf data while receiving 512 byte
>> packets. The rate should remain the same (or better) and the number of cache
>> misses on these structs should remain roughly the same. Maybe a seperate
>> patch series?
> 
> I honestly did think that reorganizing the struct was probably out of scope
> of this change, so if you agree so I'll drop this change from the v2 and
> keep the original which adds irq_num to the end of the struct.

I agree, especially in these routines, doing simple, 
explainable/observable changes is best.
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [next-queue 1/3] i40e: Store the irq number in i40e_q_vector
  2022-10-05 19:37             ` [Intel-wired-lan] " Jesse Brandeburg
@ 2022-10-06 13:06               ` Maciej Fijalkowski
  -1 siblings, 0 replies; 34+ messages in thread
From: Maciej Fijalkowski @ 2022-10-06 13:06 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: Joe Damato, intel-wired-lan, netdev, kuba, davem, anthony.l.nguyen

On Wed, Oct 05, 2022 at 12:37:19PM -0700, Jesse Brandeburg wrote:
> On 10/5/2022 11:40 AM, Joe Damato wrote:
> 
> > > If you're really wanting to reorganize these structs I'd prefer a bit more
> > > diligent effort to prove no inadvertent side effects (like maybe by turning
> > > up the interrupt rate and looking at perf data while receiving 512 byte
> > > packets. The rate should remain the same (or better) and the number of cache
> > > misses on these structs should remain roughly the same. Maybe a seperate
> > > patch series?
> > 
> > I honestly did think that reorganizing the struct was probably out of scope
> > of this change, so if you agree so I'll drop this change from the v2 and
> > keep the original which adds irq_num to the end of the struct.
> 
> I agree, especially in these routines, doing simple, explainable/observable
> changes is best.

Jesse, I recall now that this weird qvector struct layout is also the case
on ice driver. Maybe we should document it somewhere/somehow (or even
explain) to avoid touching it? I believe that not only me would have such
a knee-jerk reaction to try to pack it.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [next-queue 1/3] i40e: Store the irq number in i40e_q_vector
@ 2022-10-06 13:06               ` Maciej Fijalkowski
  0 siblings, 0 replies; 34+ messages in thread
From: Maciej Fijalkowski @ 2022-10-06 13:06 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: netdev, Joe Damato, kuba, intel-wired-lan, davem

On Wed, Oct 05, 2022 at 12:37:19PM -0700, Jesse Brandeburg wrote:
> On 10/5/2022 11:40 AM, Joe Damato wrote:
> 
> > > If you're really wanting to reorganize these structs I'd prefer a bit more
> > > diligent effort to prove no inadvertent side effects (like maybe by turning
> > > up the interrupt rate and looking at perf data while receiving 512 byte
> > > packets. The rate should remain the same (or better) and the number of cache
> > > misses on these structs should remain roughly the same. Maybe a seperate
> > > patch series?
> > 
> > I honestly did think that reorganizing the struct was probably out of scope
> > of this change, so if you agree so I'll drop this change from the v2 and
> > keep the original which adds irq_num to the end of the struct.
> 
> I agree, especially in these routines, doing simple, explainable/observable
> changes is best.

Jesse, I recall now that this weird qvector struct layout is also the case
on ice driver. Maybe we should document it somewhere/somehow (or even
explain) to avoid touching it? I believe that not only me would have such
a knee-jerk reaction to try to pack it.
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2022-10-06 13:06 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-05  8:31 [next-queue 0/3] i40e: Add an i40e_napi_poll tracepoint Joe Damato
2022-10-05  8:31 ` [Intel-wired-lan] " Joe Damato
2022-10-05  8:31 ` [next-queue 1/3] i40e: Store the irq number in i40e_q_vector Joe Damato
2022-10-05  8:31   ` [Intel-wired-lan] " Joe Damato
2022-10-05 10:29   ` Maciej Fijalkowski
2022-10-05 10:29     ` [Intel-wired-lan] " Maciej Fijalkowski
2022-10-05 17:00     ` Joe Damato
2022-10-05 17:00       ` [Intel-wired-lan] " Joe Damato
2022-10-05 18:25       ` Jesse Brandeburg
2022-10-05 18:25         ` [Intel-wired-lan] " Jesse Brandeburg
2022-10-05 18:40         ` Joe Damato
2022-10-05 18:40           ` Joe Damato
2022-10-05 19:37           ` Jesse Brandeburg
2022-10-05 19:37             ` [Intel-wired-lan] " Jesse Brandeburg
2022-10-06 13:06             ` Maciej Fijalkowski
2022-10-06 13:06               ` [Intel-wired-lan] " Maciej Fijalkowski
2022-10-05  8:31 ` [next-queue 2/3] i40e: i40e_clean_tx_irq returns work done Joe Damato
2022-10-05  8:31   ` [Intel-wired-lan] " Joe Damato
2022-10-05  8:45   ` Paul Menzel
2022-10-05  8:45     ` Paul Menzel
2022-10-05 10:46   ` Maciej Fijalkowski
2022-10-05 10:46     ` [Intel-wired-lan] " Maciej Fijalkowski
2022-10-05 17:50     ` Joe Damato
2022-10-05 17:50       ` [Intel-wired-lan] " Joe Damato
2022-10-05 18:33       ` Jesse Brandeburg
2022-10-05 18:33         ` [Intel-wired-lan] " Jesse Brandeburg
2022-10-05 18:47         ` Joe Damato
2022-10-05 18:47           ` Joe Damato
2022-10-05  8:31 ` [next-queue 3/3] i40e: Add i40e_napi_poll tracepoint Joe Damato
2022-10-05  8:31   ` [Intel-wired-lan] " Joe Damato
2022-10-05 10:27   ` Maciej Fijalkowski
2022-10-05 10:27     ` [Intel-wired-lan] " Maciej Fijalkowski
2022-10-05 17:56     ` Joe Damato
2022-10-05 17:56       ` [Intel-wired-lan] " Joe Damato

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.