From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zoltan Kiss Subject: Re: RING_HAS_UNCONSUMED_REQUESTS oddness Date: Tue, 11 Mar 2014 23:34:13 +0000 Message-ID: <531F9D75.5030600@citrix.com> References: <5318987C.3030303@citrix.com> <1394121221.13270.10.camel@hastur.hellion.org.uk> <5318A2D8.3090808@citrix.com> <20140306173057.GK11475@deinos.phlegethon.org> <5318EB1C.10500@citrix.com> <1394553355.30915.72.camel@kazak.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1WNWBe-0002ub-2y for xen-devel@lists.xenproject.org; Tue, 11 Mar 2014 23:34:18 +0000 In-Reply-To: <1394553355.30915.72.camel@kazak.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: "xen-devel@lists.xenproject.org" , Tim Deegan , Wei Liu List-Id: xen-devel@lists.xenproject.org On 11/03/14 15:55, Ian Campbell wrote: > On Thu, 2014-03-06 at 21:39 +0000, Zoltan Kiss wrote: >> On 06/03/14 17:30, Tim Deegan wrote: >>> At 16:31 +0000 on 06 Mar (1394119880), Zoltan Kiss wrote: >>>> On 06/03/14 15:53, Ian Campbell wrote: >>>>> On Thu, 2014-03-06 at 15:47 +0000, Zoltan Kiss wrote: >>>>>> By my understanding, there is no way rsp could be smaller than req, so >>>>>> there is no point having this. Am I missing something? >>>>> >>>>> It happens during wraparound, i.e. after req has wrapped but rsp hasn't >>>>> yet. >>>> >>>> The name of the macro suggest we are interested whether the ring has >>>> unconsumed requests, and netback uses it that way. The answer to that >>>> question is req_prod - req_cons. And it works if prod wrapped but cons >>>> didn't. >>> >>> Yes. >>> >>>> rsp calculates the number of "consumed but not responded" requests (it >>>> also works well if req_cons wrapped but rsp_prod_pvt didn't), then >>>> subtract it from the ring size. >>> >>> That is indeed an odd thing to check, since it seems like it could only >>> be relevant if the request producer overran the response producer. >>> It's been there in one form or another since the original ring.h, >>> and RING_REQUEST_CONS_OVERFLOW does something similar. >>> >>> I can't remember the original reasoning, and so I'm reluctant to >>> suggest removing it without some more eyes on the code... >> >> I've added the following printk before the "req < rsp" part: >> >> if (rsp < req) \ >> pr_err("req %u rsp %u req_prod %u req_cons %u rsp_prod_pvt %u\n", req, >> rsp, (_r)->sring->req_prod, (_r)->req_cons, (_r)->rsp_prod_pvt); \ >> >> And it gave me such results: >> >> xen_netback:xenvif_zerocopy_callback: req 4294967279 rsp 52 req_prod >> 1770663942 req_cons 1770663959 rsp_prod_pvt 1770663755 >> >> So it can happen that req_prod is behind req_cons, sometimes even with >> 17! But it always happen in this callback of my new grant mapping >> series, which runs outside the NAPI instance. My theory why this can happen: >> - callback reads req_prod >> - frontend writes it >> - backend picks it up, and consumes those slots >> - callback reads req_cons > > I'm a bit confused by what you mean by "it" in your theory, so perhaps I > misunderstand. Can you use the actual variable names for clarity. I meant req_prod all the time. > Are you sure this is a problem with the actual code and not just with > your debug print? I would expect the real code to be snapshotting things > as appropriate etc, and also for the public/private state to not > necessarily be totally in sync when RING_HAS_UNCONSUMED_REQUESTS is > being called. I think my above code does the right thing. For double check I printed out req, rsp, and the values they are calculated from. I guess req_prod is cached in the printk and that's why it is still the same value as we read when calculating req. I'll test that with a memory barrier. > >> So req can be near UINT_MAX if you call this macro outside the backend. >> The only place where the actual return value of this macro matters is >> xenvif_tx_build_gops, and it should be correct there. At other places we >> are only looking for the fact whether the ring has unconsumed requests >> or not. If prod is smaller than cons, we clearly read a wrong value. I >> think what we can do: >> 1. try again until its correct >> 2. just return a non-zero value, it shouldn't cause too much trouble if >> we say yes here >> 3. we can't see rsp_cons, so try to figure out if the ring is full of >> consumed but not responded requests, and return zero then, otherwise a >> positive value. That's what we do know. > > s/know/now/? Is #3 here the status quo? Yes, that's the status quo. It's a best effort calculation, rsp takes only effect if req ~ UINT_MAX. > >> Does this make sense? Should we rather go option 1? Should I post a >> comment patch to document this, and spare a few hours for future >> generations? :) > > Docs are always good IMHO. > > Ian. >