From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ian Campbell Subject: Re: RING_HAS_UNCONSUMED_REQUESTS oddness Date: Thu, 13 Mar 2014 10:04:59 +0000 Message-ID: <1394705099.25873.11.camel@kazak.uk.xensource.com> References: <5318987C.3030303@citrix.com> <1394552666.30915.64.camel@kazak.uk.xensource.com> <531F9B17.5060107@citrix.com> <1394620083.21145.21.camel@kazak.uk.xensource.com> <53206ED4.1030507@citrix.com> <1394634655.21145.94.camel@kazak.uk.xensource.com> <532079D9.70008@citrix.com> <1394638643.3457.8.camel@kazak.uk.xensource.com> <532095E5.2080807@citrix.com> <1394646199.3457.20.camel@kazak.uk.xensource.com> <5320CD47.7010405@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1WO2Vm-0005oq-QH for xen-devel@lists.xenproject.org; Thu, 13 Mar 2014 10:05:16 +0000 In-Reply-To: <5320CD47.7010405@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Zoltan Kiss Cc: "xen-devel@lists.xenproject.org" , Wei Liu , Tim Deegan List-Id: xen-devel@lists.xenproject.org On Wed, 2014-03-12 at 21:10 +0000, Zoltan Kiss wrote: > On 12/03/14 17:43, Ian Campbell wrote: > > On Wed, 2014-03-12 at 17:14 +0000, Zoltan Kiss wrote: > >> On 12/03/14 15:37, Ian Campbell wrote: > >>> On Wed, 2014-03-12 at 15:14 +0000, Zoltan Kiss wrote: > >>>> On 12/03/14 14:30, Ian Campbell wrote: > >>>>> On Wed, 2014-03-12 at 14:27 +0000, Zoltan Kiss wrote: > >>>>>> On 12/03/14 10:28, Ian Campbell wrote: > >>>>>>> On Tue, 2014-03-11 at 23:24 +0000, Zoltan Kiss wrote: > >>>>>>>> On 11/03/14 15:44, Ian Campbell wrote: > >>>>>>> > >>>>>>>>> Is it the case that this macro considers a request to be unconsumed if > >>>>>>>>> the *response* to a request is outstanding as well as if the request > >>>>>>>>> itself is still on the ring? > >>>>>>>> I don't think that would make sense. I think everywhere where this macro > >>>>>>>> is called the caller is not interested in pending request (pending means > >>>>>>>> consumed but not responded) > >>>>>>> > >>>>>>> It might be interested in such pending requests in some of the > >>>>>>> pathological cases I allude to in the next paragraph though? > >>>>>>> > >>>>>>> For example if the ring has unconsumed requests but there are no slots > >>>>>>> free for a response, it would be better to treat it as no unconsumed > >>>>>>> requests until space opens up for a response, otherwise something else > >>>>>>> just has to abort the processing of the request when it notices the lack > >>>>>>> of space. > >>>>>>> > >>>>>>> (I'm totally speculating here BTW, I don't have any concrete idea why > >>>>>>> things are done this way...) > >>>>>>> > >>>>>>> > >>>>>>>>> I wonder if this apparently weird construction is due to pathological > >>>>>>>>> cases when one or the other end is not picking up requests/responses? > >>>>>>>>> i.e. trying to avoid deadlocking the ring or generating an interrupt > >>>>>>>>> storm when the ring it is full of one or the other or something along > >>>>>>>>> those lines? > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> Also, let me quote again my example about when rsp makes sense: > >>>>>> > >>>>>> "To clarify what does this do, let me show an example: > >>>>>> req_prod = 253 > >>>>>> req_cons = 256 > >>>>>> rsp_prod_pvt = 0 > >>>>> > >>>>> I think to make sense of this I need to see the sequence of reads/writes > >>>>> from both parties in a sensible ordering which would result in reads > >>>>> showing the above. i.e. a demonstration of the race not just an > >>>>> assertion that if the values are read as is things makes sense. > >>>> > >>>> Let me extend it: > >>>> > >>>> - callback reads req_prod = 253 > >>> > >>> callback == backend? Which context is this code running in? Which part > >>> of the system is the callback logically part of? > >> Yes, it is part of the backend, the function which handles when we can > >> release a slot back. With grant copy we don't have such thing, but with > >> mapping xenvif_zerocopy_callback does this (or in classic kernel, it had > >> a different name, but we called it page destructor). It can run from any > >> context, it depends on who calls kfree_skb. > > > > I think this is the root of the problem. The pv protocols really assume > > one entity on either end is moving/updating the ring pointers. If you > There is only _one_ entity moving/updating the ring pointers. Everyone > else is just reading it. The callback, xenvif_tx_interrupt, > xenvif_check_rx_xenvif. Perhaps I should have said "accessing" rather than "moving/updating". Even if only one entity is reading the ring if two entities are reading at least one of them is going to see inconsistencies. > > In any case, it seems like doing the poke from the callback is wrong and > > we should revert the patches which DaveM already applied and revisit > > this aspect of things, do you agree? > I've just sent in a patch to fix that. Thanks, I'll take a look. > I think the reason why I haven't > seen any issue is that the in this situation there are plenty of > outstanding packets, and all of their callback will schedule NAPI again. > Chances are quite small that the dealloc thread couldn't release enough > slots in the meantime. Seems plausible enough. That means that the issue would be seen rarely on quiet systems, which would be a nightmare to diagnose I think. Ian.