From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Miller Subject: Re: Gianfar: RX Recycle skb->len error Date: Sun, 21 Mar 2010 21:46:42 -0700 (PDT) Message-ID: <20100321.214642.67901344.davem@davemloft.net> References: Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org, avorontsov@ru.mvista.com, Sandeep.Kumar@freescale.com To: ben@bigfootnetworks.com Return-path: Received: from 74-93-104-97-Washington.hfc.comcastbusiness.net ([74.93.104.97]:46515 "EHLO sunset.davemloft.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750896Ab0CVEqU (ORCPT ); Mon, 22 Mar 2010 00:46:20 -0400 In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: From: "Ben Menchaca (ben@bigfootnetworks.com)" Date: Sat, 20 Mar 2010 12:54:59 -0700 > We are seeing some random skb data length errors on RX after long-running, full-gigabit traffic. First, my debugging and solution are based on the following invariant assumption: > (skb->tail - skb->data) == skb->len > > If this is wrong, please educate. > > After some tracing, here is where the error packets seem to originate: > 1. We are cleaning rx, in gfar_clean_rx_ring; > 2. A new RX skb is drawn from the rx_recycle queue, and obey the above invariant (so, in gfar_new_skb(), __skb_dequeue returns an skb); > 3. At this point skb_reserve is called, which moves data and tail by the same calculated alignamount; > 4. So, newskb is not NULL. However, !(bdp->status & RXBD_LAST) || (bdp->status & RXBD_ERR)) is evaluates to true; > 5. Since newskb is not NULL, we arrive at the else if (skb), which is true; > 6. skb->data = skb->head + NET_SKB_PAD is applied, and then the skb is requeued for recycling. > > At this point, skb->data != skb->tail, but skb->len == 0. When this skb is used for the next RX, it is causing issues later when we skb_put trailers, and then trust skb->len. > > I would propose something like: Thanks for debugging this, some gianfar developers CC:'d. > @@ -2540,6 +2540,7 @@ > * recycle list. > */ > skb->data = skb->head + NET_SKB_PAD; > + skb_reset_tail_pointer(skb); > __skb_queue_head(&priv->rx_recycle, skb); > } > } else { This code is essentially trying to undo skb_reserve() but as you found it's doing so in a buggy manner. skb_reserve() adjusts both the 'data' and 'tail' pointers, but this attempt at a reversal is only modifying 'data'. Your fix is fine, but really any by-hand modification of skb->data is a bug, and we should provide an skb_unreserve() or similar to hide such details away, and use it here. Anton?