RE: xen-block: race condition when stopping the device (WAS: Re: [Xen-devel] [xen-4.13-testing test] 144736: regressions - FAIL)

From: "Durrant, Paul" <pdurrant@amazon.com>
To: Julien Grall <julien@xen.org>, Ian Jackson <ian.jackson@citrix.com>
Cc: "Jürgen Groß" <jgross@suse.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"osstest service owner" <osstest-admin@xenproject.org>,
	"Anthony Perard" <anthony.perard@citrix.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: RE: xen-block: race condition when stopping the device (WAS: Re: [Xen-devel] [xen-4.13-testing test] 144736: regressions - FAIL)
Date: Mon, 16 Dec 2019 09:34:27 +0000	[thread overview]
Message-ID: <e49691262df2450aa48522dc38f80657@EX13D32EUC003.ant.amazon.com> (raw)
In-Reply-To: <65df8a75-a658-1a14-6780-66c8706bcc80@xen.org>

> -----Original Message-----
[snip]
> >>
> >> This feels like a race condition between the init/free code with
> >> handler. Anthony, does it ring any bell?
> >>
> >
> >  From that stack bt it looks like an iothread managed to run after the
> sring was NULLed. This should not be able happen as the dataplane should
> have been moved back onto QEMU's main thread context before the ring is
> unmapped.
> 
> My knowledge of this code is fairly limited, so correct me if I am wrong.
> 
> blk_set_aio_context() would set the context for the block aio. AFAICT,
> the only aio for the block is xen_block_complete_aio().

Not quite. xen_block_dataplane_start() calls xen_device_bind_event_channel() and that will add an event channel fd into the aio context, so the shared ring is polled by the iothread as well as block i/o completion.

> 
> In the stack above, we are not dealing with a block aio but an aio tie
> to the event channel (see the call from xen_device_poll). So I don't
> think the blk_set_aio_context() would affect the aio.
> 

For the reason I outline above, it does.

> So it would be possible to get the iothread running because we received
> a notification on the event channel while we are stopping the block (i.e
> xen_block_dataplane_stop()).
> 

We should assume an iothread can essentially run at any time, as it is a polling entity. It should eventually block polling on fds assign to its aio context but I don't think the abstraction guarantees that it cannot be awoken for other reasons (e.g. off a timeout). However and event from the frontend will certainly cause the evtchn fd poll to wake up.

> If xen_block_dataplane_stop() grab the context lock first, then the
> iothread dealing with the event may wait on the lock until its released.
> 
> By the time the lock is grabbed, we may have free all the resources
> (including srings). So the event iothread will end up to dereference a
> NULL pointer.
> 

I think the problem may actually be that xen_block_dataplane_event() does not acquire the context and thus is not synchronized with xen_block_dataplane_stop(). The documentation in multiple-iothreads.txt is not clear whether a poll handler called by an iothread needs to acquire the context though; TBH I would not have thought it necessary.

> It feels to me we need a way to quiesce all the iothreads (blk,
> event,...) before continuing. But I am a bit unsure how to do this in
> QEMU.
> 

Looking at virtio-blk.c I see that it does seem to close off its evtchn equivalent from iothread context via aio_wait_bh_oneshot(). So I wonder whether the 'right' thing to do is to call xen_device_unbind_event_channel() using the same mechanism to ensure xen_block_dataplane_event() can't race.

  Paul

> Cheers,
> 
> --
> Julien Grall