From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wei Liu Subject: Re: Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected" Date: Mon, 17 Mar 2014 10:35:24 +0000 Message-ID: <20140317103524.GH16807@zion.uk.xensource.com> References: <20140312120444.GH19620@zion.uk.xensource.com> <751560446.20140312152336@eikelenboom.it> <20140312144826.GK19620@zion.uk.xensource.com> <1241369584.20140312154946@eikelenboom.it> <20140312145915.GM19620@zion.uk.xensource.com> <309265573.20140312160156@eikelenboom.it> <20140312150435.GO19620@zion.uk.xensource.com> <1934414370.20140312162003@eikelenboom.it> <20140312154501.GQ19620@zion.uk.xensource.com> <1189397636.20140312174729@eikelenboom.it> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <1189397636.20140312174729@eikelenboom.it> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Sander Eikelenboom Cc: annie li , Paul Durrant , Wei Liu , Zoltan Kiss , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On Wed, Mar 12, 2014 at 05:47:29PM +0100, Sander Eikelenboom wrote: > > Wednesday, March 12, 2014, 4:45:01 PM, you wrote: > > > On Wed, Mar 12, 2014 at 04:20:03PM +0100, Sander Eikelenboom wrote: > > [...] > >> > >> > Sorry, remove the trailing "S". Actually you only need to look at netback.c. > >> > >> What producer index to compare with .. there are quite some RING_GET_REQUESTS .. and i see: > >> npo->meta_prod > >> vif->rx.sring->req_prod > >> vif->pending_prod > >> > >> to name a few .. > >> Any particular RING_GET_REQUESTS call and particular producer index you are interested in ? > >> > > > There are two macros you can use > > > RING_REQUEST_CONS_OVERFLOW and RING_REQUEST_PROD_OVERFLOW. > > Ah i already produced my own .. diff to netback is attached .. > > Netback: > Mar 12 17:41:26 serveerstertje kernel: [ 464.778614] vif vif-7-0 vif7.0: ?!? npo->meta_prod:37 vif->rx.sring->req_prod:431006 vif->rx.req_cons:431007 > Mar 12 17:41:26 serveerstertje kernel: [ 464.786203] vif vif-7-0 vif7.0: ?!? npo->meta_prod:38 vif->rx.sring->req_prod:431006 vif->rx.req_cons:431008 req_prod < req_cons, so there's an overflow. I'm actually curious how this could happen. Back to the code, before netback enqueues SKB to its internal queue, it will check if there's enough room in the ring. Before Paul's changeset, it checks against a static number (the possible maximum slots that can be consumed by an SKB). Paul's changeset made it check against the actual slots the incoming SKB consumes. See interface.c:xenvif_start_xmit. Another interesting site would be when the SKB is broken down later on in internal queue. See netback.c:xenvif_rx_action. The routine to break down an SKB is xenvif_gop_skb. Although they look alright to me, but you might want to instrument them a bit more to see what triggers that overflow. It's a bit frustrating, but a bug that cannot be easily reproduced is indeed extremely hard to fix. Wei.