From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU. Date: Mon, 2 Feb 2015 12:44:46 -0500 Message-ID: <20150202174446.GA8348@l.oracle.com> References: <1421081130-22959-1-git-send-email-konrad.wilk@oracle.com> <54B4FF600200007800054376@mail.emea.novell.com> <20150202142920.GA661@l.oracle.com> <54CFA395020000780005C072@mail.emea.novell.com> <20150202153159.GA3163@l.oracle.com> <54CFAA53020000780005C103@mail.emea.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1YIL3Q-0000Jn-8C for xen-devel@lists.xenproject.org; Mon, 02 Feb 2015 17:44:56 +0000 Content-Disposition: inline In-Reply-To: <54CFAA53020000780005C103@mail.emea.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: andrew.cooper3@citrix.com, malcolm.crossley@citrix.com, xen-devel@lists.xenproject.org, linux@eikelenboom.it List-Id: xen-devel@lists.xenproject.org On Mon, Feb 02, 2015 at 03:48:19PM +0000, Jan Beulich wrote: > >>> On 02.02.15 at 16:31, wrote: > > On Mon, Feb 02, 2015 at 03:19:33PM +0000, Jan Beulich wrote: > >> >>> On 02.02.15 at 15:29, wrote: > >> > --- a/xen/drivers/passthrough/io.c > >> > +++ b/xen/drivers/passthrough/io.c > >> > @@ -63,10 +63,37 @@ enum { > >> > static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci) > >> > { > >> > unsigned long flags; > >> > + unsigned long old, new, val = pirq_dpci->state; > >> > > >> > - if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) ) > >> > - return; > >> > + /* > >> > + * This cmpxch is a more clear version of: > >> > + * if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) ) > >> > + * return; > >> > + * since it also checks for STATE_RUN conditions. > >> > + */ > >> > + for ( ;; ) > >> > + { > >> > + new = 1 << STATE_SCHED; > >> > + if ( val ) > >> > + new |= val; > >> > >> Why the if()? > > > > To 'or' the variable new with '1 << STATE_RUN' in case 'val' changed from > > the first read ('val = pirq_dpci->state') to the moment when > > we do the cmpxchg. > > But if "val" is zero, the | simply will do nothing. Correct. Keep in mind that 'new' is set to '1 << STATE_SCHED' at every loop iteration - so it ends up old = cmpxchg(.., 0, 1 << STATE_SCHED) (and old == 0, val == 0, so we end up breaking out of the loop). > > >> > + old = cmpxchg(&pirq_dpci->state, val, new); > >> > + switch ( old ) > >> > + { > >> > + case (1 << STATE_SCHED): > >> > + case (1 << STATE_RUN) | (1 << STATE_SCHED): > >> > + /* > >> > + * Whenever STATE_SCHED is set we MUST not schedule it. > >> > + */ > >> > + return; > >> > + } > >> > + /* > >> > + * If the 'state' is 0 or (1 << STATE_RUN) we can schedule it. > >> > + */ > >> > >> Really? Wasn't the intention to _not_ schedule when STATE_RUN is > >> set? (Also the above is a only line comment, i.e. wants different style.) > > > > I must confess I must have misread your last review. You said: > > > > > Here is a patch that does this. I don't yet have an setup to test > > > the failing scenario (working on that). I removed the part in > > > the softirq because with this patch I cannot see a way it would > > > ever get there (in the softirq hitting the BUG). > > > > Hmm, but how do you now schedule the second instance that needed ... > > > > > + case (1 << STATE_RUN): > > > > ... in this case? > > > > which to me implied you still want to schedule an 'dpci' when STATE_RUN is > > set? > > > > My thinking is that we should still schedule an 'dpci' when STATE_RUN is set > > as that is inline with what the old tasklet code did. It would > > schedule the tasklet the moment said tasklet was off the global list (but > > it would not add it in the global list - that would be the job of the > > tasklet function wrapper to detect and insert said tasklet back on > > the global list). > > Didn't the original discussion (and issue) revolve around scheduling > while STATE_RUN was set? Hence the intention to wait for the flag Yes. > to clear - but preferably in an explicit rather than implicit (as your > current and previous patch do) manner. If we do explicitly we run risk of dead-lock. See below of an draft (not even compiled tested) of what I think you mean. The issue is that 'raise_softirq_for' ends up being called from do_IRQ. And we might have an IRQ coming in just as we are in the dpci_softirq having set the 1 << STATE_SCHED. Our spin-and-wait in raise_softirq_for (in this code below) will spin forever. One way to not get in that quagmire is to well, do what the previous patches did. diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index ae050df..706a636 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -63,10 +63,35 @@ enum { static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci) { unsigned long flags; + unsigned long old, new, val = pirq_dpci->state; - if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) ) - return; - + for ( ;; ) + { + old = cmpxchg(&pirq_dpci->state, 0, 1 << STATE_SCHED); + switch ( old ) + { + case (1 << STATE_SCHED): + /* + * Whenever STATE_SCHED is set we MUST not schedule it. + */ + return; + case (1 << STATE_RUN) | (1 << STATE_SCHED): + case (1 << STATE_RUN): + /* Getting close to finish. Spin. */ + continue; + } + /* + * If the 'state' is 0 we can schedule it. + */ + if ( old == 0 ) + break; + } get_knownalive_domain(pirq_dpci->dom); local_irq_save(flags); > > Jan >