From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wei Liu Subject: Re: Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected" Date: Wed, 12 Mar 2014 12:04:44 +0000 Message-ID: <20140312120444.GH19620@zion.uk.xensource.com> References: <1554992598.20140307125518@eikelenboom.it> <9610144106.20140311000026@eikelenboom.it> <20140311101948.GX19620@zion.uk.xensource.com> <191977479.20140311133142@eikelenboom.it> <20140311123807.GB19620@zion.uk.xensource.com> <1068180642.20140311140041@eikelenboom.it> <20140311153616.GE19620@zion.uk.xensource.com> <1751914199.20140312024243@eikelenboom.it> <20140312113522.GG19620@zion.uk.xensource.com> <659214230.20140312124516@eikelenboom.it> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <659214230.20140312124516@eikelenboom.it> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Sander Eikelenboom Cc: annie li , Paul Durrant , Wei Liu , Zoltan Kiss , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On Wed, Mar 12, 2014 at 12:45:16PM +0100, Sander Eikelenboom wrote: [...] > >> > >> - Sometimes (but not always) netfront also spits out: > >> dev_warn(dev, "Invalid extra type: %d\n", extra->type); > >> where the extra type seems a random value (seen 196, 31 ..) > >> - Sometimes (but not always) netfront also spits out: > >> dev_warn(dev, "Need more slots\n"); > >> - Sometimes (but not always) netfront also spits out: > >> dev_warn(dev, "Missing extra info\n"); > >> > > > That's because garbage is pushed to frontend. It's trying to decode some > > random values. > > >> > >> First question that comes to my mind: > >> - Are the warnings netfront spits out the cause of xen reporting the bad grant reference ? > >> Or > >> Are the Bad grant references Xen is reporting .. causing netfront to spit out the warnings ? > > > No. They are not directly connected. > > > 0. backend breaks down a skb into slots. > > 1. backend gets a request from the rx ring (a slot with gref provided by > > frontend), until all slots in skb are handled. > > 2. backend grant-copies data to that slot (with gref). > > 3. backend pushes a response to frontend, whose content depends on the > > result of previous step. > > 4. frontend handles that response. > > > So the grant copy error in hypervisor you're seeing is in 2. The > > complaint from frontend is in 4, when backend pushes a response with > > error status in it. The bug is most likely in 0, when the calculation > > goes wrong - either in the breakdown process, or in the macro that > > returns usable slots (this can lead to backend thinks there's enough > > slots while there's not). > > Ok so perhaps a debug patch which does more sanity checking there ? > Because i don't see any errors/warnings from netback in dom0. > >>From the look of the code, 0 looks correct to me. Netback won't complain about "bad gref" because it has no idea what a "good gref" looks like. Only Xen has the knowledge whether a gref is legit. All netback sees is the hypercall fails and it tries to push corresponding response to netfront. But you can probably safely assume a gref larger than several thousands be bad. > >> - Why is that "if (rx->flags & XEN_NETRXF_extra_info) {}" part in xen-netfront.c doing there and changing cons midway ? > >> The commit message from f942dc2552b8bfdee607be867b12a8971bb9cd85 that introduced the if says: > >> > >> One major change from xen.git is that the guest transmit path (i.e. what > >> looks like receive to netback) has been significantly reworked to remove > >> the dependency on the out of tree PageForeign page flag (a core kernel > >> patch which enables a per page destructor callback on the final > >> put_page). This page flag was used in order to implement a grant map > >> based transmit path (where guest pages are mapped directly into SKB > >> frags). Instead this version of netback uses grant copy operations into > >> regular memory belonging to the backend domain. Reinstating the grant > >> map functionality is something which I would like to revisit in the > >> future. > >> > >> It *is* using grant copy now .. so should this part have been removed ? > >> And > >> Could Paul's commit that seems to be the first to trigger these events affect this ? > >> > > > This depicts guest *transmit* path, but the issue you're seeing is in > > guest *receive* path. So that's totally different thing. > > Errr yes well if the (rx->flags & XEN_NETRXF_extra_info) {}) should only be doing something on the transmit path .. > why does it seem to result to true on my issue ? (see cons_changed=1 in the debub output.. and you see cons being cons + 1 after that) First thing is the response goes wild and probably contains random value so it's possible the flag is "set". For a legit response, if you see that flag it means there's an "extra info" slot in the response. That code snippet is to extract that extra info slot (could be several in a row, that's why there's a loop in xennet_get_extras). The request / response format is documented in include/xen/interface/io/netif.h -- it's only the tx one, but rx one is more or less the same. Wei. > >