All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] tg3: Limit minimum tx queue wakeup threshold
@ 2014-08-19 18:52 Benjamin Poirier
  2014-08-19 18:52 ` [PATCH 2/3] tg3: Fix tx_pending checks for tg3_tso_bug Benjamin Poirier
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Benjamin Poirier @ 2014-08-19 18:52 UTC (permalink / raw)
  To: Prashant Sreedharan, Michael Chan; +Cc: netdev, linux-kernel

tx_pending may be set by the user (via ethtool -G) to a low enough value that
TG3_TX_WAKEUP_THRESH becomes smaller than MAX_SKB_FRAGS + 1. This may cause
the tx queue to be waked when there are in fact not enough descriptors to
handle an skb with max frags. This in turn causes tg3_start_xmit() to return
NETDEV_TX_BUSY and print error messages. Fix the problem by putting a limit to
how low TG3_TX_WAKEUP_THRESH can go.

Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
---

I noticed the problem in a 3.0 kernel when setting `ethtool eth0 -G tx 50` and
running a netperf TCP_STREAM test. The console fills up with 
[10597.596155] tg3 0000:06:00.0: eth0: BUG! Tx Ring full when queue awake!
The problem in tg3 remains in current kernels though it does not reproduce as
easily since "5640f76 net: use a per task frag allocator (v3.7-rc1)". I
reproduced on current kernels by using the fail_page_alloc fault injection
mechanism to force the creation of skbs with many order-0 frags. Note that the
following script may also trigger another bug (NETDEV WATCHDOG), which is
fixed in the next patch.

$ cat /tmp/doit.sh
#!/bin/bash

F="/sys/kernel/debug/fail_page_alloc"

echo -1 > "$F/times"
echo 0 > "$F/verbose"
echo 0 > "$F/ignore-gfp-wait"
echo 1 > "$F/task-filter"
echo 100 > "$F/probability"

netperf -H 192.168.9.30 -l100 -t omni -- -d send &

n=$!

sleep 0.3
echo 1 > "/proc/$n/make-it-fail"
sleep 10

kill "$n"

---
 drivers/net/ethernet/broadcom/tg3.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 3ac5d23..b11c0fd 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -202,7 +202,8 @@ static inline void _tg3_flag_clear(enum TG3_FLAGS flag, unsigned long *bits)
 #endif
 
 /* minimum number of free TX descriptors required to wake up TX process */
-#define TG3_TX_WAKEUP_THRESH(tnapi)		((tnapi)->tx_pending / 4)
+#define TG3_TX_WAKEUP_THRESH(tnapi)	max_t(u32, (tnapi)->tx_pending / 4, \
+					      MAX_SKB_FRAGS + 1)
 #define TG3_TX_BD_DMA_MAX_2K		2048
 #define TG3_TX_BD_DMA_MAX_4K		4096
 
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] tg3: Fix tx_pending checks for tg3_tso_bug
  2014-08-19 18:52 [PATCH 1/3] tg3: Limit minimum tx queue wakeup threshold Benjamin Poirier
@ 2014-08-19 18:52 ` Benjamin Poirier
  2014-08-19 20:56   ` Benjamin Poirier
  2014-08-19 23:10   ` Michael Chan
  2014-08-19 18:52 ` [PATCH 3/3] tg3: Fix tx_pending check for MAX_SKB_FRAGS Benjamin Poirier
  2014-08-19 22:00 ` [PATCH 1/3] tg3: Limit minimum tx queue wakeup threshold Michael Chan
  2 siblings, 2 replies; 14+ messages in thread
From: Benjamin Poirier @ 2014-08-19 18:52 UTC (permalink / raw)
  To: Prashant Sreedharan, Michael Chan; +Cc: netdev, linux-kernel

In tg3_set_ringparam(), the tx_pending test to cover the cases where
tg3_tso_bug() is entered has two problems
1) the check is only done for certain hardware whereas the workaround
is now used more broadly. IOW, the check may not be performed when it
is needed.
2) the check is too optimistic.

For example, with a 5761 (SHORT_DMA_BUG), tg3_set_ringparam() skips over the
"tx_pending <= (MAX_SKB_FRAGS * 3)" check because TSO_BUG is false. Even if it
did do the check, with a full sized skb, frag_cnt_est = 135 but the check is
for <= MAX_SKB_FRAGS * 3 (= 17 * 3 = 51). So the check is insufficient. This
leads to the following situation: by setting, ex. tx_pending = 100, there can
be an skb that triggers tg3_tso_bug() and that is large enough to cause
tg3_tso_bug() to stop the queue even when it is empty. We then end up with a
netdev watchdog transmit timeout.

Given that 1) some of the conditions tested for in tg3_tx_frag_set() apply
regardless of the chipset flags and that 2) it is difficult to estimate ahead
of time the max possible number of frames that a large skb may be split into
by gso, we instead take the approach of adjusting dev->gso_max_segs according
to the requested tx_pending size.

This puts us in the exceptional situation that a single skb that triggers
tg3_tso_bug() may require the entire tx ring. Usually the tx queue is woken up
when at least a quarter of it is available (TG3_TX_WAKEUP_THRESH) but that
would be insufficient now. To avoid useless wakeups, the tx queue wake up
threshold is made dynamic.

Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
---

I reproduced this bug using the same approach explained in patch 1.
The bug reproduces with tx_pending <= 135

---
 drivers/net/ethernet/broadcom/tg3.c | 31 ++++++++++++++++++++-----------
 drivers/net/ethernet/broadcom/tg3.h |  1 +
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index b11c0fd..7022f6d 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -6609,10 +6609,10 @@ static void tg3_tx(struct tg3_napi *tnapi)
 	smp_mb();
 
 	if (unlikely(netif_tx_queue_stopped(txq) &&
-		     (tg3_tx_avail(tnapi) > TG3_TX_WAKEUP_THRESH(tnapi)))) {
+		     (tg3_tx_avail(tnapi) > tnapi->wakeup_thresh))) {
 		__netif_tx_lock(txq, smp_processor_id());
 		if (netif_tx_queue_stopped(txq) &&
-		    (tg3_tx_avail(tnapi) > TG3_TX_WAKEUP_THRESH(tnapi)))
+		    (tg3_tx_avail(tnapi) > tnapi->wakeup_thresh))
 			netif_tx_wake_queue(txq);
 		__netif_tx_unlock(txq);
 	}
@@ -7838,11 +7838,14 @@ static int tg3_tso_bug(struct tg3 *tp, struct tg3_napi *tnapi,
 		       struct netdev_queue *txq, struct sk_buff *skb)
 {
 	struct sk_buff *segs, *nskb;
-	u32 frag_cnt_est = skb_shinfo(skb)->gso_segs * 3;
 
-	/* Estimate the number of fragments in the worst case */
-	if (unlikely(tg3_tx_avail(tnapi) <= frag_cnt_est)) {
+	if (unlikely(tg3_tx_avail(tnapi) <= skb_shinfo(skb)->gso_segs)) {
+		trace_printk("stopping queue, %d <= %d\n",
+			     tg3_tx_avail(tnapi), skb_shinfo(skb)->gso_segs);
 		netif_tx_stop_queue(txq);
+		trace_printk("stopped queue\n");
+		tnapi->wakeup_thresh = skb_shinfo(skb)->gso_segs;
+		BUG_ON(tnapi->wakeup_thresh >= tnapi->tx_pending);
 
 		/* netif_tx_stop_queue() must be done before checking
 		 * checking tx index in tg3_tx_avail() below, because in
@@ -7850,7 +7853,7 @@ static int tg3_tso_bug(struct tg3 *tp, struct tg3_napi *tnapi,
 		 * netif_tx_queue_stopped().
 		 */
 		smp_mb();
-		if (tg3_tx_avail(tnapi) <= frag_cnt_est)
+		if (tg3_tx_avail(tnapi) <= tnapi->wakeup_thresh)
 			return NETDEV_TX_BUSY;
 
 		netif_tx_wake_queue(txq);
@@ -7905,12 +7908,17 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (unlikely(budget <= (skb_shinfo(skb)->nr_frags + 1))) {
 		if (!netif_tx_queue_stopped(txq)) {
 			netif_tx_stop_queue(txq);
+			tnapi->wakeup_thresh = TG3_TX_WAKEUP_THRESH(tnapi);
 
 			/* This is a hard error, log it. */
 			netdev_err(dev,
 				   "BUG! Tx Ring full when queue awake!\n");
 		}
-		return NETDEV_TX_BUSY;
+		smp_mb();
+		if (tg3_tx_avail(tnapi) <= tnapi->wakeup_thresh)
+			return NETDEV_TX_BUSY;
+
+		netif_tx_wake_queue(txq);
 	}
 
 	entry = tnapi->tx_prod;
@@ -8089,6 +8097,7 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	tnapi->tx_prod = entry;
 	if (unlikely(tg3_tx_avail(tnapi) <= (MAX_SKB_FRAGS + 1))) {
 		netif_tx_stop_queue(txq);
+		tnapi->wakeup_thresh = TG3_TX_WAKEUP_THRESH(tnapi);
 
 		/* netif_tx_stop_queue() must be done before checking
 		 * checking tx index in tg3_tx_avail() below, because in
@@ -8096,7 +8105,7 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		 * netif_tx_queue_stopped().
 		 */
 		smp_mb();
-		if (tg3_tx_avail(tnapi) > TG3_TX_WAKEUP_THRESH(tnapi))
+		if (tg3_tx_avail(tnapi) > tnapi->wakeup_thresh)
 			netif_tx_wake_queue(txq);
 	}
 
@@ -12319,9 +12328,7 @@ static int tg3_set_ringparam(struct net_device *dev, struct ethtool_ringparam *e
 	if ((ering->rx_pending > tp->rx_std_ring_mask) ||
 	    (ering->rx_jumbo_pending > tp->rx_jmb_ring_mask) ||
 	    (ering->tx_pending > TG3_TX_RING_SIZE - 1) ||
-	    (ering->tx_pending <= MAX_SKB_FRAGS) ||
-	    (tg3_flag(tp, TSO_BUG) &&
-	     (ering->tx_pending <= (MAX_SKB_FRAGS * 3))))
+	    (ering->tx_pending <= MAX_SKB_FRAGS))
 		return -EINVAL;
 
 	if (netif_running(dev)) {
@@ -12341,6 +12348,7 @@ static int tg3_set_ringparam(struct net_device *dev, struct ethtool_ringparam *e
 	if (tg3_flag(tp, JUMBO_RING_ENABLE))
 		tp->rx_jumbo_pending = ering->rx_jumbo_pending;
 
+	dev->gso_max_segs = ering->tx_pending - 1;
 	for (i = 0; i < tp->irq_max; i++)
 		tp->napi[i].tx_pending = ering->tx_pending;
 
@@ -17817,6 +17825,7 @@ static int tg3_init_one(struct pci_dev *pdev,
 		else
 			sndmbx += 0xc;
 	}
+	dev->gso_max_segs = TG3_DEF_TX_RING_PENDING - 1;
 
 	tg3_init_coal(tp);
 
diff --git a/drivers/net/ethernet/broadcom/tg3.h b/drivers/net/ethernet/broadcom/tg3.h
index 461acca..6a7e13d 100644
--- a/drivers/net/ethernet/broadcom/tg3.h
+++ b/drivers/net/ethernet/broadcom/tg3.h
@@ -3006,6 +3006,7 @@ struct tg3_napi {
 	u32				tx_pending;
 	u32				last_tx_cons;
 	u32				prodmbox;
+	u32				wakeup_thresh;
 	struct tg3_tx_buffer_desc	*tx_ring;
 	struct tg3_tx_ring_info		*tx_buffers;
 
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] tg3: Fix tx_pending check for MAX_SKB_FRAGS
  2014-08-19 18:52 [PATCH 1/3] tg3: Limit minimum tx queue wakeup threshold Benjamin Poirier
  2014-08-19 18:52 ` [PATCH 2/3] tg3: Fix tx_pending checks for tg3_tso_bug Benjamin Poirier
@ 2014-08-19 18:52 ` Benjamin Poirier
  2014-08-19 22:00 ` [PATCH 1/3] tg3: Limit minimum tx queue wakeup threshold Michael Chan
  2 siblings, 0 replies; 14+ messages in thread
From: Benjamin Poirier @ 2014-08-19 18:52 UTC (permalink / raw)
  To: Prashant Sreedharan, Michael Chan; +Cc: netdev, linux-kernel

The rest of the driver assumes at least one free descriptor in the tx ring.
Therefore, since an skb with max frags takes up (MAX_SKB_FRAGS + 1)
descriptors, tx_pending must be > (MAX_SKB_FRAGS + 1).

Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
---

Currently, it is possible to set tx_pending = MAX_SKB_FRAGS + 1. This leads to
a netdev watchdog tx timeout. Depending on whether the previous patches in
this series are applied or not, the timeout happens as soon as tx_pending is
updated or after an skb with max frags is submitted for transmission.

I reproduced this bug using the same approach explained in patch 1.
The bug reproduces with tx_pending = 18

---
 drivers/net/ethernet/broadcom/tg3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 7022f6d..27e2701 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -12328,7 +12328,7 @@ static int tg3_set_ringparam(struct net_device *dev, struct ethtool_ringparam *e
 	if ((ering->rx_pending > tp->rx_std_ring_mask) ||
 	    (ering->rx_jumbo_pending > tp->rx_jmb_ring_mask) ||
 	    (ering->tx_pending > TG3_TX_RING_SIZE - 1) ||
-	    (ering->tx_pending <= MAX_SKB_FRAGS))
+	    (ering->tx_pending <= MAX_SKB_FRAGS + 1))
 		return -EINVAL;
 
 	if (netif_running(dev)) {
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] tg3: Fix tx_pending checks for tg3_tso_bug
  2014-08-19 18:52 ` [PATCH 2/3] tg3: Fix tx_pending checks for tg3_tso_bug Benjamin Poirier
@ 2014-08-19 20:56   ` Benjamin Poirier
  2014-08-19 23:10   ` Michael Chan
  1 sibling, 0 replies; 14+ messages in thread
From: Benjamin Poirier @ 2014-08-19 20:56 UTC (permalink / raw)
  To: Prashant Sreedharan, Michael Chan; +Cc: netdev, linux-kernel

On 2014/08/19 11:52, Benjamin Poirier wrote:
> +		trace_printk("stopping queue, %d <= %d\n",
> +			     tg3_tx_avail(tnapi), skb_shinfo(skb)->gso_segs);
>  		netif_tx_stop_queue(txq);
> +		trace_printk("stopped queue\n");

err, I'll resubmit without the trace_printk. Please review for other
issues nevertheless.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] tg3: Limit minimum tx queue wakeup threshold
  2014-08-19 18:52 [PATCH 1/3] tg3: Limit minimum tx queue wakeup threshold Benjamin Poirier
  2014-08-19 18:52 ` [PATCH 2/3] tg3: Fix tx_pending checks for tg3_tso_bug Benjamin Poirier
  2014-08-19 18:52 ` [PATCH 3/3] tg3: Fix tx_pending check for MAX_SKB_FRAGS Benjamin Poirier
@ 2014-08-19 22:00 ` Michael Chan
  2014-08-21 22:04   ` Benjamin Poirier
  2 siblings, 1 reply; 14+ messages in thread
From: Michael Chan @ 2014-08-19 22:00 UTC (permalink / raw)
  To: Benjamin Poirier; +Cc: Prashant Sreedharan, netdev, linux-kernel

On Tue, 2014-08-19 at 11:52 -0700, Benjamin Poirier wrote: 
> diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
> index 3ac5d23..b11c0fd 100644
> --- a/drivers/net/ethernet/broadcom/tg3.c
> +++ b/drivers/net/ethernet/broadcom/tg3.c
> @@ -202,7 +202,8 @@ static inline void _tg3_flag_clear(enum TG3_FLAGS flag, unsigned long *bits)
>  #endif
>  
>  /* minimum number of free TX descriptors required to wake up TX process */
> -#define TG3_TX_WAKEUP_THRESH(tnapi)            ((tnapi)->tx_pending / 4)
> +#define TG3_TX_WAKEUP_THRESH(tnapi)    max_t(u32, (tnapi)->tx_pending / 4, \
> +                                             MAX_SKB_FRAGS + 1)

I think we should precompute this and store it in something like
tp->tx_wake_thresh.

>  #define TG3_TX_BD_DMA_MAX_2K           2048
>  #define TG3_TX_BD_DMA_MAX_4K           4096
>   


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] tg3: Fix tx_pending checks for tg3_tso_bug
  2014-08-19 18:52 ` [PATCH 2/3] tg3: Fix tx_pending checks for tg3_tso_bug Benjamin Poirier
  2014-08-19 20:56   ` Benjamin Poirier
@ 2014-08-19 23:10   ` Michael Chan
  2014-08-21  1:23     ` Benjamin Poirier
  1 sibling, 1 reply; 14+ messages in thread
From: Michael Chan @ 2014-08-19 23:10 UTC (permalink / raw)
  To: Benjamin Poirier; +Cc: Prashant Sreedharan, netdev, linux-kernel

On Tue, 2014-08-19 at 11:52 -0700, Benjamin Poirier wrote: 
> @@ -7838,11 +7838,14 @@ static int tg3_tso_bug(struct tg3 *tp, struct tg3_napi *tnapi,
>                        struct netdev_queue *txq, struct sk_buff *skb)
>  {
>         struct sk_buff *segs, *nskb;
> -       u32 frag_cnt_est = skb_shinfo(skb)->gso_segs * 3;
>  
> -       /* Estimate the number of fragments in the worst case */
> -       if (unlikely(tg3_tx_avail(tnapi) <= frag_cnt_est)) {
> +       if (unlikely(tg3_tx_avail(tnapi) <= skb_shinfo(skb)->gso_segs)) {
> +               trace_printk("stopping queue, %d <= %d\n",
> +                            tg3_tx_avail(tnapi), skb_shinfo(skb)->gso_segs);
>                 netif_tx_stop_queue(txq);
> +               trace_printk("stopped queue\n");
> +               tnapi->wakeup_thresh = skb_shinfo(skb)->gso_segs;
> +               BUG_ON(tnapi->wakeup_thresh >= tnapi->tx_pending);
>  
>                 /* netif_tx_stop_queue() must be done before checking
>                  * checking tx index in tg3_tx_avail() below, because in 

I don't quite understand this logic and I must be missing something.
gso_segs is the number of TCP segments the large packet will be broken
up into.  If it exceeds dev->gso_max_segs, it means it exceeds
hardware's capabilty and it will do GSO instead of TSO.  But in this
case in tg3_tso_bug(), we are doing GSO and we may not have enough DMA
descriptors to do GSO.  Each gso_seg typically requires 2 DMA
descriptors.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] tg3: Fix tx_pending checks for tg3_tso_bug
  2014-08-19 23:10   ` Michael Chan
@ 2014-08-21  1:23     ` Benjamin Poirier
  2014-08-21  9:51       ` Michael Chan
  0 siblings, 1 reply; 14+ messages in thread
From: Benjamin Poirier @ 2014-08-21  1:23 UTC (permalink / raw)
  To: Michael Chan; +Cc: Prashant Sreedharan, netdev, linux-kernel

On 2014/08/19 16:10, Michael Chan wrote:
> On Tue, 2014-08-19 at 11:52 -0700, Benjamin Poirier wrote: 
> > @@ -7838,11 +7838,14 @@ static int tg3_tso_bug(struct tg3 *tp, struct tg3_napi *tnapi,
> >                        struct netdev_queue *txq, struct sk_buff *skb)
> >  {
> >         struct sk_buff *segs, *nskb;
> > -       u32 frag_cnt_est = skb_shinfo(skb)->gso_segs * 3;
> >  
> > -       /* Estimate the number of fragments in the worst case */
> > -       if (unlikely(tg3_tx_avail(tnapi) <= frag_cnt_est)) {
> > +       if (unlikely(tg3_tx_avail(tnapi) <= skb_shinfo(skb)->gso_segs)) {
> > +               trace_printk("stopping queue, %d <= %d\n",
> > +                            tg3_tx_avail(tnapi), skb_shinfo(skb)->gso_segs);
> >                 netif_tx_stop_queue(txq);
> > +               trace_printk("stopped queue\n");
> > +               tnapi->wakeup_thresh = skb_shinfo(skb)->gso_segs;
> > +               BUG_ON(tnapi->wakeup_thresh >= tnapi->tx_pending);
> >  
> >                 /* netif_tx_stop_queue() must be done before checking
> >                  * checking tx index in tg3_tx_avail() below, because in 
> 
> I don't quite understand this logic and I must be missing something.
> gso_segs is the number of TCP segments the large packet will be broken
> up into.  If it exceeds dev->gso_max_segs, it means it exceeds
> hardware's capabilty and it will do GSO instead of TSO.  But in this
> case in tg3_tso_bug(), we are doing GSO and we may not have enough DMA
> descriptors to do GSO.  Each gso_seg typically requires 2 DMA
> descriptors.

You're right, I had wrongly assumed that the skbs coming out of
skb_gso_segment() were linear. I'll address that in v2 of the patch by masking
out NETIF_F_SG in tg3_tso_bug().

I noticed another issue that had not occurred to me: when tg3_tso_bug is
submitting a full gso segs sequence to tg3_start_xmit, the code at the end of
that function stops the queue before the end of the sequence because tx_avail
becomes smaller than (MAX_SKB_FRAGS + 1). The transmission actually proceeds
because tg3_tso_bug() does not honour the queue state but it seems rather
unsightly to me. I'm trying different solutions to this and will resubmit.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] tg3: Fix tx_pending checks for tg3_tso_bug
  2014-08-21  1:23     ` Benjamin Poirier
@ 2014-08-21  9:51       ` Michael Chan
  2014-08-21 21:59         ` Benjamin Poirier
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Chan @ 2014-08-21  9:51 UTC (permalink / raw)
  To: Benjamin Poirier; +Cc: Prashant Sreedharan, netdev, linux-kernel

On Wed, 2014-08-20 at 18:23 -0700, Benjamin Poirier wrote: 
> On 2014/08/19 16:10, Michael Chan wrote:
> > On Tue, 2014-08-19 at 11:52 -0700, Benjamin Poirier wrote: 
> > > @@ -7838,11 +7838,14 @@ static int tg3_tso_bug(struct tg3 *tp, struct tg3_napi *tnapi,
> > >                        struct netdev_queue *txq, struct sk_buff *skb)
> > >  {
> > >         struct sk_buff *segs, *nskb;
> > > -       u32 frag_cnt_est = skb_shinfo(skb)->gso_segs * 3;
> > >  
> > > -       /* Estimate the number of fragments in the worst case */
> > > -       if (unlikely(tg3_tx_avail(tnapi) <= frag_cnt_est)) {
> > > +       if (unlikely(tg3_tx_avail(tnapi) <= skb_shinfo(skb)->gso_segs)) {
> > > +               trace_printk("stopping queue, %d <= %d\n",
> > > +                            tg3_tx_avail(tnapi), skb_shinfo(skb)->gso_segs);
> > >                 netif_tx_stop_queue(txq);
> > > +               trace_printk("stopped queue\n");
> > > +               tnapi->wakeup_thresh = skb_shinfo(skb)->gso_segs;
> > > +               BUG_ON(tnapi->wakeup_thresh >= tnapi->tx_pending);
> > >  
> > >                 /* netif_tx_stop_queue() must be done before checking
> > >                  * checking tx index in tg3_tx_avail() below, because in 
> > 
> > I don't quite understand this logic and I must be missing something.
> > gso_segs is the number of TCP segments the large packet will be broken
> > up into.  If it exceeds dev->gso_max_segs, it means it exceeds
> > hardware's capabilty and it will do GSO instead of TSO.  But in this
> > case in tg3_tso_bug(), we are doing GSO and we may not have enough DMA
> > descriptors to do GSO.  Each gso_seg typically requires 2 DMA
> > descriptors.
> 
> You're right, I had wrongly assumed that the skbs coming out of
> skb_gso_segment() were linear. I'll address that in v2 of the patch by masking
> out NETIF_F_SG in tg3_tso_bug().
> 

While masking out NETF_F_SG will work, it will also disable checksum
offload for the whole device momentarily.

> I noticed another issue that had not occurred to me: when tg3_tso_bug is
> submitting a full gso segs sequence to tg3_start_xmit, the code at the end of
> that function stops the queue before the end of the sequence because tx_avail
> becomes smaller than (MAX_SKB_FRAGS + 1). The transmission actually proceeds
> because tg3_tso_bug() does not honour the queue state but it seems rather
> unsightly to me.

That's why the number of DMA descriptors that we estimate has to be
accurate.  It's unfortunate that the various tg3 chips require so many
different workarounds.  The objective is to keep TSO and checksum
enabled and workaround the occasional packets using GSO.

I believe that the boundary error conditions that you brought up can be
addressed by enforcing some limits on the tx ring size and by reducing
gso_max_size/gso_max_segs when necessary (for example when MTU and/or
ring size is set very small).




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] tg3: Fix tx_pending checks for tg3_tso_bug
  2014-08-21  9:51       ` Michael Chan
@ 2014-08-21 21:59         ` Benjamin Poirier
  2014-08-22  4:25           ` David Miller
  0 siblings, 1 reply; 14+ messages in thread
From: Benjamin Poirier @ 2014-08-21 21:59 UTC (permalink / raw)
  To: Michael Chan; +Cc: Prashant Sreedharan, netdev, linux-kernel

On 2014/08/21 02:51, Michael Chan wrote:
> On Wed, 2014-08-20 at 18:23 -0700, Benjamin Poirier wrote: 
> > On 2014/08/19 16:10, Michael Chan wrote:
> > > On Tue, 2014-08-19 at 11:52 -0700, Benjamin Poirier wrote: 
> > > > @@ -7838,11 +7838,14 @@ static int tg3_tso_bug(struct tg3 *tp, struct tg3_napi *tnapi,
> > > >                        struct netdev_queue *txq, struct sk_buff *skb)
> > > >  {
> > > >         struct sk_buff *segs, *nskb;
> > > > -       u32 frag_cnt_est = skb_shinfo(skb)->gso_segs * 3;
> > > >  
> > > > -       /* Estimate the number of fragments in the worst case */
> > > > -       if (unlikely(tg3_tx_avail(tnapi) <= frag_cnt_est)) {
> > > > +       if (unlikely(tg3_tx_avail(tnapi) <= skb_shinfo(skb)->gso_segs)) {
> > > > +               trace_printk("stopping queue, %d <= %d\n",
> > > > +                            tg3_tx_avail(tnapi), skb_shinfo(skb)->gso_segs);
> > > >                 netif_tx_stop_queue(txq);
> > > > +               trace_printk("stopped queue\n");
> > > > +               tnapi->wakeup_thresh = skb_shinfo(skb)->gso_segs;
> > > > +               BUG_ON(tnapi->wakeup_thresh >= tnapi->tx_pending);
> > > >  
> > > >                 /* netif_tx_stop_queue() must be done before checking
> > > >                  * checking tx index in tg3_tx_avail() below, because in 
> > > 
> > > I don't quite understand this logic and I must be missing something.
> > > gso_segs is the number of TCP segments the large packet will be broken
> > > up into.  If it exceeds dev->gso_max_segs, it means it exceeds
> > > hardware's capabilty and it will do GSO instead of TSO.  But in this
> > > case in tg3_tso_bug(), we are doing GSO and we may not have enough DMA
> > > descriptors to do GSO.  Each gso_seg typically requires 2 DMA
> > > descriptors.
> > 
> > You're right, I had wrongly assumed that the skbs coming out of
> > skb_gso_segment() were linear. I'll address that in v2 of the patch by masking
> > out NETIF_F_SG in tg3_tso_bug().
> > 
> 
> While masking out NETF_F_SG will work, it will also disable checksum
> offload for the whole device momentarily.
> 
> > I noticed another issue that had not occurred to me: when tg3_tso_bug is
> > submitting a full gso segs sequence to tg3_start_xmit, the code at the end of
> > that function stops the queue before the end of the sequence because tx_avail
> > becomes smaller than (MAX_SKB_FRAGS + 1). The transmission actually proceeds
> > because tg3_tso_bug() does not honour the queue state but it seems rather
> > unsightly to me.
> 
> That's why the number of DMA descriptors that we estimate has to be
> accurate.  It's unfortunate that the various tg3 chips require so many
> different workarounds.  The objective is to keep TSO and checksum
> enabled and workaround the occasional packets using GSO.

Ah, now I understand the reason for the * 3 in
	u32 frag_cnt_est = skb_shinfo(skb)->gso_segs * 3;

	/* Estimate the number of fragments in the worst case */
but that is not really the "worst case". It's not forbidden to have more than
two frags per skb output from skb_gso_segment(). I've kept this estimation
approach but I've added code to validate the estimation or else linearize the
skb.

> 
> I believe that the boundary error conditions that you brought up can be
> addressed by enforcing some limits on the tx ring size and by reducing
> gso_max_size/gso_max_segs when necessary (for example when MTU and/or
> ring size is set very small).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] tg3: Limit minimum tx queue wakeup threshold
  2014-08-19 22:00 ` [PATCH 1/3] tg3: Limit minimum tx queue wakeup threshold Michael Chan
@ 2014-08-21 22:04   ` Benjamin Poirier
  2014-08-21 22:32     ` Michael Chan
  0 siblings, 1 reply; 14+ messages in thread
From: Benjamin Poirier @ 2014-08-21 22:04 UTC (permalink / raw)
  To: Michael Chan; +Cc: Prashant Sreedharan, netdev, linux-kernel

On 2014/08/19 15:00, Michael Chan wrote:
> On Tue, 2014-08-19 at 11:52 -0700, Benjamin Poirier wrote: 
> > diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
> > index 3ac5d23..b11c0fd 100644
> > --- a/drivers/net/ethernet/broadcom/tg3.c
> > +++ b/drivers/net/ethernet/broadcom/tg3.c
> > @@ -202,7 +202,8 @@ static inline void _tg3_flag_clear(enum TG3_FLAGS flag, unsigned long *bits)
> >  #endif
> >  
> >  /* minimum number of free TX descriptors required to wake up TX process */
> > -#define TG3_TX_WAKEUP_THRESH(tnapi)            ((tnapi)->tx_pending / 4)
> > +#define TG3_TX_WAKEUP_THRESH(tnapi)    max_t(u32, (tnapi)->tx_pending / 4, \
> > +                                             MAX_SKB_FRAGS + 1)
> 
> I think we should precompute this and store it in something like
> tp->tx_wake_thresh.

I've tried this by adding the following patch at the end of the v2
series but I did not measure a significant latency improvement. Was
there another reason for the change?

Here are the performance results. The first set of numbers are the same
as those found in patch v2 3/3.

# perf stat -r10 -ad netperf -H 192.168.9.30 -l60 -T 0,0 -t omni -- -d rr

* with patches 1-3
	rr values: 6732.65 6920.1 6909.46 7032.41 6864.43 6897.6 6815.19 6967.83 6849.23 6929.52
	sample size: 10
	mean: 6891.842
	standard deviation: 82.91901
	quantiles: 6732.65 6853.03 6903.53 6927.165 7032.41
	6890±80

 Performance counter stats for 'netperf -H 192.168.9.30 -l60 -T 0,0 -t omni -- -d rr' (10 runs):

     480675.949728 task-clock                #    8.001 CPUs utilized            ( +-  0.01% ) [100.00%]
           850,461 context-switches          #    0.002 M/sec                    ( +-  0.37% ) [100.00%]
               564 CPU-migrations            #    0.000 M/sec                    ( +-  5.67% ) [100.00%]
               417 page-faults               #    0.000 M/sec                    ( +- 76.04% )
   287,019,442,295 cycles                    #    0.597 GHz                      ( +-  7.16% ) [15.01%]
   828,198,830,689 stalled-cycles-frontend   #  288.55% frontend cycles idle     ( +-  3.01% ) [25.01%]
   718,230,307,166 stalled-cycles-backend    #  250.24% backend  cycles idle     ( +-  3.53% ) [35.00%]
   117,976,598,188 instructions              #    0.41  insns per cycle
                                             #    7.02  stalled cycles per insn  ( +-  4.06% ) [45.00%]
    26,715,853,108 branches                  #   55.580 M/sec                    ( +-  3.77% ) [50.00%]
       198,787,673 branch-misses             #    0.74% of all branches          ( +-  0.86% ) [50.00%]
    28,416,922,166 L1-dcache-loads           #   59.119 M/sec                    ( +-  3.54% ) [50.00%]
       367,613,007 L1-dcache-load-misses     #    1.29% of all L1-dcache hits    ( +-  0.47% ) [50.00%]
        75,260,575 LLC-loads                 #    0.157 M/sec                    ( +-  2.24% ) [40.00%]
             5,777 LLC-load-misses           #    0.01% of all LL-cache hits     ( +- 36.03% ) [ 5.00%]

      60.077898757 seconds time elapsed                                          ( +-  0.01% )

* with patches 1-3 + tx_wake_thresh_def
	rr values: 6636.87 6874.05 6916.29 6961.68 6941.3 6841.44 6829.05 6806.55 6846.04 6958.39
	sample size: 10
	mean: 6861.166
	standard deviation: 96.67967
	quantiles: 6636.87 6832.148 6860.045 6935.048 6961.68
	6900±100

 Performance counter stats for 'netperf -H 192.168.9.30 -l60 -T 0,0 -t omni -- -d rr' (10 runs):

     480688.653656 task-clock                #    8.001 CPUs utilized            ( +-  0.01% ) [100.00%]
           846,980 context-switches          #    0.002 M/sec                    ( +-  0.40% ) [100.00%]
               524 CPU-migrations            #    0.000 M/sec                    ( +- 11.82% ) [100.00%]
               420 page-faults               #    0.000 M/sec                    ( +- 75.31% )
   275,602,421,981 cycles                    #    0.573 GHz                      ( +-  3.23% ) [15.01%]
   806,335,406,844 stalled-cycles-frontend   #  292.57% frontend cycles idle     ( +-  2.16% ) [25.01%]
   640,757,376,054 stalled-cycles-backend    #  232.49% backend  cycles idle     ( +-  2.46% ) [35.00%]
   113,241,018,220 instructions              #    0.41  insns per cycle
                                             #    7.12  stalled cycles per insn  ( +-  1.93% ) [45.00%]
    25,479,064,973 branches                  #   53.005 M/sec                    ( +-  1.96% ) [50.00%]
       205,483,191 branch-misses             #    0.81% of all branches          ( +-  0.75% ) [50.00%]
    27,209,883,125 L1-dcache-loads           #   56.606 M/sec                    ( +-  1.87% ) [50.00%]
       361,721,478 L1-dcache-load-misses     #    1.33% of all L1-dcache hits    ( +-  0.51% ) [50.00%]
        80,669,260 LLC-loads                 #    0.168 M/sec                    ( +-  1.01% ) [40.00%]
             8,846 LLC-load-misses           #    0.01% of all LL-cache hits     ( +- 34.01% ) [ 5.00%]

      60.079761525 seconds time elapsed                                          ( +-  0.01% )

---
 drivers/net/ethernet/broadcom/tg3.c | 27 ++++++++++++++++-----------
 drivers/net/ethernet/broadcom/tg3.h |  5 +++--
 2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index c29f2e3..81e390b 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -6478,10 +6478,11 @@ static void tg3_dump_state(struct tg3 *tp)
 			   tnapi->hw_status->idx[0].tx_consumer);
 
 		netdev_err(tp->dev,
-		"%d: NAPI info [%08x:%08x:(%04x:%04x:%04x):%04x:(%04x:%04x:%04x:%04x)]\n",
+		"%d: NAPI info [%08x:%08x:(%04x:%04x:%04x:%04x):%04x:(%04x:%04x:%04x:%04x)]\n",
 			   i,
 			   tnapi->last_tag, tnapi->last_irq_tag,
 			   tnapi->tx_prod, tnapi->tx_cons, tnapi->tx_pending,
+			   tnapi->tx_wake_thresh_cur,
 			   tnapi->rx_rcb_ptr,
 			   tnapi->prodring.rx_std_prod_idx,
 			   tnapi->prodring.rx_std_cons_idx,
@@ -6613,10 +6614,10 @@ static void tg3_tx(struct tg3_napi *tnapi)
 	smp_mb();
 
 	if (unlikely(netif_tx_queue_stopped(txq) &&
-		     (tg3_tx_avail(tnapi) > tnapi->wakeup_thresh))) {
+		     (tg3_tx_avail(tnapi) > tnapi->tx_wake_thresh_cur))) {
 		__netif_tx_lock(txq, smp_processor_id());
 		if (netif_tx_queue_stopped(txq) &&
-		    (tg3_tx_avail(tnapi) > tnapi->wakeup_thresh))
+		    (tg3_tx_avail(tnapi) > tnapi->tx_wake_thresh_cur))
 			netif_tx_wake_queue(txq);
 		__netif_tx_unlock(txq);
 	}
@@ -7849,8 +7850,8 @@ static int tg3_tso_bug(struct tg3 *tp, struct tg3_napi *tnapi,
 
 	if (unlikely(tg3_tx_avail(tnapi) <= desc_cnt_est)) {
 		netif_tx_stop_queue(txq);
-		tnapi->wakeup_thresh = desc_cnt_est;
-		BUG_ON(tnapi->wakeup_thresh >= tnapi->tx_pending);
+		tnapi->tx_wake_thresh_cur = desc_cnt_est;
+		BUG_ON(tnapi->tx_wake_thresh_cur >= tnapi->tx_pending);
 
 		/* netif_tx_stop_queue() must be done before checking
 		 * checking tx index in tg3_tx_avail() below, because in
@@ -7858,7 +7859,7 @@ static int tg3_tso_bug(struct tg3 *tp, struct tg3_napi *tnapi,
 		 * netif_tx_queue_stopped().
 		 */
 		smp_mb();
-		if (tg3_tx_avail(tnapi) <= tnapi->wakeup_thresh)
+		if (tg3_tx_avail(tnapi) <= tnapi->tx_wake_thresh_cur)
 			return NETDEV_TX_BUSY;
 
 		netif_tx_wake_queue(txq);
@@ -7938,14 +7939,14 @@ static netdev_tx_t __tg3_start_xmit(struct sk_buff *skb,
 	if (unlikely(budget <= (skb_shinfo(skb)->nr_frags + 1))) {
 		if (!netif_tx_queue_stopped(txq)) {
 			netif_tx_stop_queue(txq);
-			tnapi->wakeup_thresh = TG3_TX_WAKEUP_THRESH(tnapi);
+			tnapi->tx_wake_thresh_cur = tnapi->tx_wake_thresh_def;
 
 			/* This is a hard error, log it. */
 			netdev_err(dev,
 				   "BUG! Tx Ring full when queue awake!\n");
 		}
 		smp_mb();
-		if (tg3_tx_avail(tnapi) <= tnapi->wakeup_thresh)
+		if (tg3_tx_avail(tnapi) <= tnapi->tx_wake_thresh_cur)
 			return NETDEV_TX_BUSY;
 
 		netif_tx_wake_queue(txq);
@@ -8127,7 +8128,7 @@ static netdev_tx_t __tg3_start_xmit(struct sk_buff *skb,
 	tnapi->tx_prod = entry;
 	if (unlikely(tg3_tx_avail(tnapi) <= stop_thresh)) {
 		netif_tx_stop_queue(txq);
-		tnapi->wakeup_thresh = TG3_TX_WAKEUP_THRESH(tnapi);
+		tnapi->tx_wake_thresh_cur = tnapi->tx_wake_thresh_def;
 
 		/* netif_tx_stop_queue() must be done before checking
 		 * checking tx index in tg3_tx_avail() below, because in
@@ -8135,7 +8136,7 @@ static netdev_tx_t __tg3_start_xmit(struct sk_buff *skb,
 		 * netif_tx_queue_stopped().
 		 */
 		smp_mb();
-		if (tg3_tx_avail(tnapi) > tnapi->wakeup_thresh)
+		if (tg3_tx_avail(tnapi) > tnapi->tx_wake_thresh_cur)
 			netif_tx_wake_queue(txq);
 	}
 
@@ -12379,8 +12380,11 @@ static int tg3_set_ringparam(struct net_device *dev, struct ethtool_ringparam *e
 		tp->rx_jumbo_pending = ering->rx_jumbo_pending;
 
 	dev->gso_max_segs = TG3_TX_SEG_PER_DESC(ering->tx_pending - 1);
-	for (i = 0; i < tp->irq_max; i++)
+	for (i = 0; i < tp->irq_max; i++) {
 		tp->napi[i].tx_pending = ering->tx_pending;
+		tp->napi[i].tx_wake_thresh_def =
+			TG3_TX_WAKEUP_THRESH(&tp->napi[i]);
+	}
 
 	if (netif_running(dev)) {
 		tg3_halt(tp, RESET_KIND_SHUTDOWN, 1);
@@ -17820,6 +17824,7 @@ static int tg3_init_one(struct pci_dev *pdev,
 
 		tnapi->tp = tp;
 		tnapi->tx_pending = TG3_DEF_TX_RING_PENDING;
+		tnapi->tx_wake_thresh_def = TG3_TX_WAKEUP_THRESH(tnapi);
 
 		tnapi->int_mbox = intmbx;
 		if (i <= 4)
diff --git a/drivers/net/ethernet/broadcom/tg3.h b/drivers/net/ethernet/broadcom/tg3.h
index 6a7e13d..44a21cb 100644
--- a/drivers/net/ethernet/broadcom/tg3.h
+++ b/drivers/net/ethernet/broadcom/tg3.h
@@ -3004,11 +3004,12 @@ struct tg3_napi {
 	u32				tx_prod	____cacheline_aligned;
 	u32				tx_cons;
 	u32				tx_pending;
-	u32				last_tx_cons;
 	u32				prodmbox;
-	u32				wakeup_thresh;
+	u32				tx_wake_thresh_cur;
+	u32				tx_wake_thresh_def;
 	struct tg3_tx_buffer_desc	*tx_ring;
 	struct tg3_tx_ring_info		*tx_buffers;
+	u32				last_tx_cons;
 
 	dma_addr_t			status_mapping;
 	dma_addr_t			rx_rcb_mapping;


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] tg3: Limit minimum tx queue wakeup threshold
  2014-08-21 22:04   ` Benjamin Poirier
@ 2014-08-21 22:32     ` Michael Chan
  2014-08-21 23:06       ` Benjamin Poirier
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Chan @ 2014-08-21 22:32 UTC (permalink / raw)
  To: Benjamin Poirier; +Cc: Prashant Sreedharan, netdev, linux-kernel

On Thu, 2014-08-21 at 15:04 -0700, Benjamin Poirier wrote: 
> On 2014/08/19 15:00, Michael Chan wrote:
> > On Tue, 2014-08-19 at 11:52 -0700, Benjamin Poirier wrote: 
> > > diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
> > > index 3ac5d23..b11c0fd 100644
> > > --- a/drivers/net/ethernet/broadcom/tg3.c
> > > +++ b/drivers/net/ethernet/broadcom/tg3.c
> > > @@ -202,7 +202,8 @@ static inline void _tg3_flag_clear(enum TG3_FLAGS flag, unsigned long *bits)
> > >  #endif
> > >  
> > >  /* minimum number of free TX descriptors required to wake up TX process */
> > > -#define TG3_TX_WAKEUP_THRESH(tnapi)            ((tnapi)->tx_pending / 4)
> > > +#define TG3_TX_WAKEUP_THRESH(tnapi)    max_t(u32, (tnapi)->tx_pending / 4, \
> > > +                                             MAX_SKB_FRAGS + 1)
> > 
> > I think we should precompute this and store it in something like
> > tp->tx_wake_thresh.
> 
> I've tried this by adding the following patch at the end of the v2
> series but I did not measure a significant latency improvement. Was
> there another reason for the change? 

Just performance.  The wake up threshold is checked in the tx fast path
in both start_xmit() and tg3_tx().  I would optimize such code for speed
as much as possible.  In the current code, it was just a right shift
operation.  Now, with max_t() added, I think I prefer having it
pre-computed.  The performance difference may not be measurable, but I
think the compiled code size may be smaller too.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] tg3: Limit minimum tx queue wakeup threshold
  2014-08-21 22:32     ` Michael Chan
@ 2014-08-21 23:06       ` Benjamin Poirier
  2014-08-21 23:26         ` Michael Chan
  0 siblings, 1 reply; 14+ messages in thread
From: Benjamin Poirier @ 2014-08-21 23:06 UTC (permalink / raw)
  To: Michael Chan; +Cc: Prashant Sreedharan, netdev, linux-kernel

On 2014/08/21 15:32, Michael Chan wrote:
> On Thu, 2014-08-21 at 15:04 -0700, Benjamin Poirier wrote: 
> > On 2014/08/19 15:00, Michael Chan wrote:
> > > On Tue, 2014-08-19 at 11:52 -0700, Benjamin Poirier wrote: 
> > > > diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
> > > > index 3ac5d23..b11c0fd 100644
> > > > --- a/drivers/net/ethernet/broadcom/tg3.c
> > > > +++ b/drivers/net/ethernet/broadcom/tg3.c
> > > > @@ -202,7 +202,8 @@ static inline void _tg3_flag_clear(enum TG3_FLAGS flag, unsigned long *bits)
> > > >  #endif
> > > >  
> > > >  /* minimum number of free TX descriptors required to wake up TX process */
> > > > -#define TG3_TX_WAKEUP_THRESH(tnapi)            ((tnapi)->tx_pending / 4)
> > > > +#define TG3_TX_WAKEUP_THRESH(tnapi)    max_t(u32, (tnapi)->tx_pending / 4, \
> > > > +                                             MAX_SKB_FRAGS + 1)
> > > 
> > > I think we should precompute this and store it in something like
> > > tp->tx_wake_thresh.
> > 
> > I've tried this by adding the following patch at the end of the v2
> > series but I did not measure a significant latency improvement. Was
> > there another reason for the change? 
> 
> Just performance.  The wake up threshold is checked in the tx fast path
> in both start_xmit() and tg3_tx().  I would optimize such code for speed

I don't see what you mean. The code in those two functions that used to
invoke TG3_TX_WAKEUP_THRESH is wrapped in unlikely() conditions. You
can't tell me that's the fast path ;) It's only checked when the queue
is stopped.

Moreover, the patches I've sent already add tg3_napi.wakeup_thresh. It
is over those patches that I've made the measurements.

> as much as possible.  In the current code, it was just a right shift
> operation.  Now, with max_t() added, I think I prefer having it
> pre-computed.  The performance difference may not be measurable, but I
> think the compiled code size may be smaller too.

Maybe in certain areas, but not overall:

with v2 patches 1-3
   text    data     bss     dec     hex filename
 149495    1247       0  150742   24cd6 drivers/net/ethernet/broadcom/tg3.o
with v2 patches 1-3 + tx_wake_thresh_def
   text    data     bss     dec     hex filename
 149524    1247       0  150771   24cf3 drivers/net/ethernet/broadcom/tg3.o

I really don't see a gain.

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] tg3: Limit minimum tx queue wakeup threshold
  2014-08-21 23:06       ` Benjamin Poirier
@ 2014-08-21 23:26         ` Michael Chan
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Chan @ 2014-08-21 23:26 UTC (permalink / raw)
  To: Benjamin Poirier; +Cc: Prashant Sreedharan, netdev, linux-kernel

On Thu, 2014-08-21 at 16:06 -0700, Benjamin Poirier wrote: 
> On 2014/08/21 15:32, Michael Chan wrote:
> > On Thu, 2014-08-21 at 15:04 -0700, Benjamin Poirier wrote: 
> > > On 2014/08/19 15:00, Michael Chan wrote:
> > > > On Tue, 2014-08-19 at 11:52 -0700, Benjamin Poirier wrote: 
> > > > > diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
> > > > > index 3ac5d23..b11c0fd 100644
> > > > > --- a/drivers/net/ethernet/broadcom/tg3.c
> > > > > +++ b/drivers/net/ethernet/broadcom/tg3.c
> > > > > @@ -202,7 +202,8 @@ static inline void _tg3_flag_clear(enum TG3_FLAGS flag, unsigned long *bits)
> > > > >  #endif
> > > > >  
> > > > >  /* minimum number of free TX descriptors required to wake up TX process */
> > > > > -#define TG3_TX_WAKEUP_THRESH(tnapi)            ((tnapi)->tx_pending / 4)
> > > > > +#define TG3_TX_WAKEUP_THRESH(tnapi)    max_t(u32, (tnapi)->tx_pending / 4, \
> > > > > +                                             MAX_SKB_FRAGS + 1)
> > > > 
> > > > I think we should precompute this and store it in something like
> > > > tp->tx_wake_thresh.
> > > 
> > > I've tried this by adding the following patch at the end of the v2
> > > series but I did not measure a significant latency improvement. Was
> > > there another reason for the change? 
> > 
> > Just performance.  The wake up threshold is checked in the tx fast path
> > in both start_xmit() and tg3_tx().  I would optimize such code for speed
> 
> I don't see what you mean. The code in those two functions that used to
> invoke TG3_TX_WAKEUP_THRESH is wrapped in unlikely() conditions. You
> can't tell me that's the fast path ;) It's only checked when the queue
> is stopped.

I missed the unlikely().  So you're right.  It's not really in the fast
path.

> 
> Moreover, the patches I've sent already add tg3_napi.wakeup_thresh. It
> is over those patches that I've made the measurements.

Right.  But my original comment was over your original patch #1 which
was adding max_t() to the macro TG3_TX_WAKE_THRESH without adding
wakeup_thresh field.  All my comments (performance and smaller code)
were based on your original patch #1.  Later I did see that your patch 3
converted TG3_TX_WAKEUP_THRESH to a structure field so it's no longer an
issue.

> 
> > as much as possible.  In the current code, it was just a right shift
> > operation.  Now, with max_t() added, I think I prefer having it
> > pre-computed.  The performance difference may not be measurable, but I
> > think the compiled code size may be smaller too.
> 
> Maybe in certain areas, but not overall:
> 
> with v2 patches 1-3
>    text    data     bss     dec     hex filename
>  149495    1247       0  150742   24cd6 drivers/net/ethernet/broadcom/tg3.o
> with v2 patches 1-3 + tx_wake_thresh_def
>    text    data     bss     dec     hex filename
>  149524    1247       0  150771   24cf3 drivers/net/ethernet/broadcom/tg3.o
> 
> I really don't see a gain.
> 

Agreed.  Once you have converted the TG3_TX_WAKEUP_THRESH to a structure
field, that's sufficient.  No need to have multiple fields.  Thanks.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] tg3: Fix tx_pending checks for tg3_tso_bug
  2014-08-21 21:59         ` Benjamin Poirier
@ 2014-08-22  4:25           ` David Miller
  0 siblings, 0 replies; 14+ messages in thread
From: David Miller @ 2014-08-22  4:25 UTC (permalink / raw)
  To: bpoirier; +Cc: mchan, prashant, netdev, linux-kernel

From: Benjamin Poirier <bpoirier@suse.de>
Date: Thu, 21 Aug 2014 14:59:11 -0700

> Ah, now I understand the reason for the * 3 in
> 	u32 frag_cnt_est = skb_shinfo(skb)->gso_segs * 3;
> 
> 	/* Estimate the number of fragments in the worst case */
> but that is not really the "worst case". It's not forbidden to have more than
> two frags per skb output from skb_gso_segment(). I've kept this estimation
> approach but I've added code to validate the estimation or else linearize the
> skb.

This is a common situation drivers run into, and there have been a few
notable situations in virtualization drivers recently.  Although in
those cases the problems arise from the fact that if an SKB fragment
is a compound page, this can evaluate to requiring multiple
descriptors, one for each 4K segment within that fragment.

Anyways, the point I wanted to make is that you shouldn't do anything
too complicated to handle all of this.  And I think your conclusion
to linearize if the estimation fails is a good one.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-08-22  4:25 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-19 18:52 [PATCH 1/3] tg3: Limit minimum tx queue wakeup threshold Benjamin Poirier
2014-08-19 18:52 ` [PATCH 2/3] tg3: Fix tx_pending checks for tg3_tso_bug Benjamin Poirier
2014-08-19 20:56   ` Benjamin Poirier
2014-08-19 23:10   ` Michael Chan
2014-08-21  1:23     ` Benjamin Poirier
2014-08-21  9:51       ` Michael Chan
2014-08-21 21:59         ` Benjamin Poirier
2014-08-22  4:25           ` David Miller
2014-08-19 18:52 ` [PATCH 3/3] tg3: Fix tx_pending check for MAX_SKB_FRAGS Benjamin Poirier
2014-08-19 22:00 ` [PATCH 1/3] tg3: Limit minimum tx queue wakeup threshold Michael Chan
2014-08-21 22:04   ` Benjamin Poirier
2014-08-21 22:32     ` Michael Chan
2014-08-21 23:06       ` Benjamin Poirier
2014-08-21 23:26         ` Michael Chan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.