All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] Question on aio_poll
@ 2013-07-20 13:14 Alex Bligh
  2013-07-23 12:18 ` Stefan Hajnoczi
  0 siblings, 1 reply; 5+ messages in thread
From: Alex Bligh @ 2013-07-20 13:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bligh

As part of adding timer operations to aio_poll and a clock to AioContext, I
am trying to figure out a couple of points on how aio_poll works. This
primarily revolves around the flag busy.

Firstly, at the end of aio_poll, it has
     assert(progress || busy);

This assert in fact checks nothing, because before the poll (20 lines or
so higher), it has
     if (!busy) {
         return progress;
     }

and as 'busy' is not altered, busy must be true at the assert.

Is this in fact meant to be checking that we have made progress if aio_poll
is called blocking? IE assert(progress || !blocking) ?

Secondly, the tests I am writing for the timer fail because g_poll is
not called, because busy is false. This is because I haven't got an
fd set up. But in the instance where aio_poll is called with blocking=true
and there are no fd's to wait on, surely aio_poll should either hang
indefinitely or (perhaps better) assert, rather than return immediately
which is a recipe for an unexpected busywait? If I have timers, it should
be running those timers rather than returning.

Thirdly, I don't quite understand how/why busy is being set. It seems
to be set if the flush callback returns non-zero. That would imply (I think)
the fd handler has something to write. But what if it is just interested
in any data to read that is available (and never writes)? If this is the
only fd aio_poll has, it would appear it never polls.

When adapting this for timers, I think it should be returning true only
if a an AIO dispatch did something, or a BH was executed, or a timer ran.
Specifically if the poll simply times out, it should not be returning
true unless a timer ran. Correct?

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Question on aio_poll
  2013-07-20 13:14 [Qemu-devel] Question on aio_poll Alex Bligh
@ 2013-07-23 12:18 ` Stefan Hajnoczi
  2013-07-23 14:46   ` Alex Bligh
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Hajnoczi @ 2013-07-23 12:18 UTC (permalink / raw)
  To: Alex Bligh; +Cc: qemu-devel

On Sat, Jul 20, 2013 at 02:14:43PM +0100, Alex Bligh wrote:
> As part of adding timer operations to aio_poll and a clock to AioContext, I
> am trying to figure out a couple of points on how aio_poll works. This
> primarily revolves around the flag busy.
> 
> Firstly, at the end of aio_poll, it has
>     assert(progress || busy);
> 
> This assert in fact checks nothing, because before the poll (20 lines or
> so higher), it has
>     if (!busy) {
>         return progress;
>     }
> 
> and as 'busy' is not altered, busy must be true at the assert.
> 
> Is this in fact meant to be checking that we have made progress if aio_poll
> is called blocking? IE assert(progress || !blocking) ?
> 
> Secondly, the tests I am writing for the timer fail because g_poll is
> not called, because busy is false. This is because I haven't got an
> fd set up. But in the instance where aio_poll is called with blocking=true
> and there are no fd's to wait on, surely aio_poll should either hang
> indefinitely or (perhaps better) assert, rather than return immediately
> which is a recipe for an unexpected busywait? If I have timers, it should
> be running those timers rather than returning.

FWIW this goes away in my series that gets rid of .io_flush():

http://lists.nongnu.org/archive/html/qemu-devel/2013-07/msg00092.html

Unfortunately there is an issue with the series which I haven't had time
to look into yet.  I don't remember the details but I think make check
is failing.

The current qemu.git/master code is doing the "correct" thing though.
Callers of aio_poll() are using it to complete any pending I/O requests
and process BHs.  If there is no work left, we do not want to block
indefinitely.  Instead we want to return.

> Thirdly, I don't quite understand how/why busy is being set. It seems
> to be set if the flush callback returns non-zero. That would imply (I think)
> the fd handler has something to write. But what if it is just interested
> in any data to read that is available (and never writes)? If this is the
> only fd aio_poll has, it would appear it never polls.

The point of .io_flush() is to select file descriptors that are awaiting
I/O (either direction).  For example, consider an iSCSI TCP socket with
no I/O requests pending.  In that case .io_flush() returns 0 and we will
not block in aio_poll().  But if there is an iSCSI request pending, then
.io_flush() will return 1 and we'll wait for the iSCSI response to be
received.

The effect of .io_flush() is that aio_poll() will return false if there
is no I/O pending.

It turned out that this behavior could be implemented at the block layer
instead of using the .io_flush() interface at the AioContext layer.  The
patch series I linked to above modifies the code so AioContext can
eliminate the .io_flush() concept.

> When adapting this for timers, I think it should be returning true only
> if a an AIO dispatch did something, or a BH was executed, or a timer ran.
> Specifically if the poll simply times out, it should not be returning
> true unless a timer ran. Correct?

Yes, the return value is about making progress.  If there is still work
pending then it must return true.  If there is no work pending it must
return false.

Stefan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Question on aio_poll
  2013-07-23 12:18 ` Stefan Hajnoczi
@ 2013-07-23 14:46   ` Alex Bligh
  2013-07-24  7:54     ` Stefan Hajnoczi
  0 siblings, 1 reply; 5+ messages in thread
From: Alex Bligh @ 2013-07-23 14:46 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel, Alex Bligh

Stefan,

--On 23 July 2013 14:18:25 +0200 Stefan Hajnoczi <stefanha@gmail.com> wrote:

>> Secondly, the tests I am writing for the timer fail because g_poll is
>> not called, because busy is false. This is because I haven't got an
>> fd set up. But in the instance where aio_poll is called with
>> blocking=true and there are no fd's to wait on, surely aio_poll should
>> either hang indefinitely or (perhaps better) assert, rather than return
>> immediately which is a recipe for an unexpected busywait? If I have
>> timers, it should be running those timers rather than returning.
>
> FWIW this goes away in my series that gets rid of .io_flush():
>
> http://lists.nongnu.org/archive/html/qemu-devel/2013-07/msg00092.html

Oooh another overlapping patch series into the mix :-)

> Unfortunately there is an issue with the series which I haven't had time
> to look into yet.  I don't remember the details but I think make check
> is failing.
>
> The current qemu.git/master code is doing the "correct" thing though.
> Callers of aio_poll() are using it to complete any pending I/O requests
> and process BHs.  If there is no work left, we do not want to block
> indefinitely.  Instead we want to return.

If we have no work to do (no FDs) and have a timer, then this should
wait for the timer to expire (i.e. wait until progress has been
made). Hence without a timer, it would be peculiar if it returned
earlier.

I think it should behave like select really, i.e. if you give it
an infinite timeout (blocking) and no descriptors to work on, it hangs
for ever. At the very least it should warn, as this is in my opinion
an error by the caller.

I left this how it was in the end (I think), and got round it by
creating a bogus pipe for the test to listen to.

>> Thirdly, I don't quite understand how/why busy is being set. It seems
>> to be set if the flush callback returns non-zero. That would imply (I
>> think) the fd handler has something to write. But what if it is just
>> interested in any data to read that is available (and never writes)? If
>> this is the only fd aio_poll has, it would appear it never polls.
>
> The point of .io_flush() is to select file descriptors that are awaiting
> I/O (either direction).  For example, consider an iSCSI TCP socket with
> no I/O requests pending.  In that case .io_flush() returns 0 and we will
> not block in aio_poll().  But if there is an iSCSI request pending, then
> .io_flush() will return 1 and we'll wait for the iSCSI response to be
> received.
>
> The effect of .io_flush() is that aio_poll() will return false if there
> is no I/O pending.

Right, but take that example. If the tcp socket is idle because it's an
iSCSI server and it is waiting for an iSCSI request, then io_flush
returns 0. That will mean busy will not be set, and if it's the only
FD, g_poll won't be called AT ALL - forget the fact it won't block -
because it will exit aio_poll a couple of lines before the g_poll. That
means you'll never actually poll for the incoming iSCSI command.
Surely that can't be right!

Or are you saying that this type of FD never appears in the aio poll
set so it is just returning for the main loop to handle them.

> It turned out that this behavior could be implemented at the block layer
> instead of using the .io_flush() interface at the AioContext layer.  The
> patch series I linked to above modifies the code so AioContext can
> eliminate the .io_flush() concept.

I've just had a quick read of that.

I think the key one is:
  http://lists.nongnu.org/archive/html/qemu-devel/2013-07/msg00099.html

I note you've eliminated 'busy' - hurrah.

I note you now have:
     if (ctx->pollfds->len == 1) {
         return progress;
     }

Is the '1' there the event notifier? How do we know there is only
one of them?

>> When adapting this for timers, I think it should be returning true only
>> if a an AIO dispatch did something, or a BH was executed, or a timer ran.
>> Specifically if the poll simply times out, it should not be returning
>> true unless a timer ran. Correct?
>
> Yes, the return value is about making progress.  If there is still work
> pending then it must return true.  If there is no work pending it must
> return false.

Yup, you made the same fix as me at the end of aio_poll in my PATCHv2
RFC series.

On a related point, the g_source appears very fragile in respect of
false wakeups. I would not be confident that it would not busy-loop.
See the comments in the last of the patches in my series.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Question on aio_poll
  2013-07-23 14:46   ` Alex Bligh
@ 2013-07-24  7:54     ` Stefan Hajnoczi
  2013-07-24  8:05       ` Alex Bligh
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Hajnoczi @ 2013-07-24  7:54 UTC (permalink / raw)
  To: Alex Bligh; +Cc: qemu-devel

On Tue, Jul 23, 2013 at 03:46:23PM +0100, Alex Bligh wrote:
> --On 23 July 2013 14:18:25 +0200 Stefan Hajnoczi <stefanha@gmail.com> wrote:
> >Unfortunately there is an issue with the series which I haven't had time
> >to look into yet.  I don't remember the details but I think make check
> >is failing.
> >
> >The current qemu.git/master code is doing the "correct" thing though.
> >Callers of aio_poll() are using it to complete any pending I/O requests
> >and process BHs.  If there is no work left, we do not want to block
> >indefinitely.  Instead we want to return.
> 
> If we have no work to do (no FDs) and have a timer, then this should
> wait for the timer to expire (i.e. wait until progress has been
> made). Hence without a timer, it would be peculiar if it returned
> earlier.
> 
> I think it should behave like select really, i.e. if you give it
> an infinite timeout (blocking) and no descriptors to work on, it hangs
> for ever. At the very least it should warn, as this is in my opinion
> an error by the caller.
> 
> I left this how it was in the end (I think), and got round it by
> creating a bogus pipe for the test to listen to.

Doing that requires the changes in my patch series, otherwise you break
aio_poll() loops that are waiting for pending I/O requests.  They don't
want to wait for timers.

> >>Thirdly, I don't quite understand how/why busy is being set. It seems
> >>to be set if the flush callback returns non-zero. That would imply (I
> >>think) the fd handler has something to write. But what if it is just
> >>interested in any data to read that is available (and never writes)? If
> >>this is the only fd aio_poll has, it would appear it never polls.
> >
> >The point of .io_flush() is to select file descriptors that are awaiting
> >I/O (either direction).  For example, consider an iSCSI TCP socket with
> >no I/O requests pending.  In that case .io_flush() returns 0 and we will
> >not block in aio_poll().  But if there is an iSCSI request pending, then
> >.io_flush() will return 1 and we'll wait for the iSCSI response to be
> >received.
> >
> >The effect of .io_flush() is that aio_poll() will return false if there
> >is no I/O pending.
> 
> Right, but take that example. If the tcp socket is idle because it's an
> iSCSI server and it is waiting for an iSCSI request, then io_flush
> returns 0. That will mean busy will not be set, and if it's the only
> FD, g_poll won't be called AT ALL - forget the fact it won't block -
> because it will exit aio_poll a couple of lines before the g_poll. That
> means you'll never actually poll for the incoming iSCSI command.
> Surely that can't be right!
> 
> Or are you saying that this type of FD never appears in the aio poll
> set so it is just returning for the main loop to handle them.

That happens because QEMU has two types of fd monitoring.  It has
AioContext's aio_poll() which is designed for asynchronous I/O requests
initiated by QEMU.  It can wait for them to complete.

QEMU also has main-loop's qemu_set_fd_handler() (iohandler) which is
used for server connections like the one you described.  The NBD server
uses it, for example.

I hope we can eventually unify event loops and then the select function
should behave as you described.  For now though, we need to keep the
current behavior until my .io_flush() removal series or something
equivalent is merged, at least.

> >It turned out that this behavior could be implemented at the block layer
> >instead of using the .io_flush() interface at the AioContext layer.  The
> >patch series I linked to above modifies the code so AioContext can
> >eliminate the .io_flush() concept.
> 
> I've just had a quick read of that.
> 
> I think the key one is:
>  http://lists.nongnu.org/archive/html/qemu-devel/2013-07/msg00099.html
> 
> I note you've eliminated 'busy' - hurrah.
> 
> I note you now have:
>     if (ctx->pollfds->len == 1) {
>         return progress;
>     }
> 
> Is the '1' there the event notifier? How do we know there is only
> one of them?

There many be many EventNotifier instances.  That's not what matters.

Rather, it's about the aio_notify() EventNotifier.  Each AioContext has
its own EventNotifier which can be signalled with aio_notify().  The
purpose of this function is to kick an event loop that is blocking in
select()/poll().  This is necessary when another thread modifies
something that the AioContext needs to act upon, such as adding/removing
an fd.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Question on aio_poll
  2013-07-24  7:54     ` Stefan Hajnoczi
@ 2013-07-24  8:05       ` Alex Bligh
  0 siblings, 0 replies; 5+ messages in thread
From: Alex Bligh @ 2013-07-24  8:05 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel, Alex Bligh

Stefan,

--On 24 July 2013 09:54:39 +0200 Stefan Hajnoczi <stefanha@gmail.com> wrote:

>> I left this how it was in the end (I think), and got round it by
>> creating a bogus pipe for the test to listen to.
>
> Doing that requires the changes in my patch series, otherwise you break
> aio_poll() loops that are waiting for pending I/O requests.  They don't
> want to wait for timers.

Sorry I meant leaving the main code as is, and creating the bogus
pipe solely in tests/test-aio.c in my new test that tests a timer
attached to AioContext - no changes elsewhere.

> I hope we can eventually unify event loops and then the select function
> should behave as you described.  For now though, we need to keep the
> current behavior until my .io_flush() removal series or something
> equivalent is merged, at least.

OK. That's pretty much the way I went with the PATCHv2 series.

>> I note you now have:
>>     if (ctx->pollfds->len == 1) {
>>         return progress;
>>     }
>>
>> Is the '1' there the event notifier? How do we know there is only
>> one of them?
>
> There many be many EventNotifier instances.  That's not what matters.
>
> Rather, it's about the aio_notify() EventNotifier.  Each AioContext has
> its own EventNotifier which can be signalled with aio_notify().  The
> purpose of this function is to kick an event loop that is blocking in
> select()/poll().  This is necessary when another thread modifies
> something that the AioContext needs to act upon, such as adding/removing
> an fd.

Thanks

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-07-24  8:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-20 13:14 [Qemu-devel] Question on aio_poll Alex Bligh
2013-07-23 12:18 ` Stefan Hajnoczi
2013-07-23 14:46   ` Alex Bligh
2013-07-24  7:54     ` Stefan Hajnoczi
2013-07-24  8:05       ` Alex Bligh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.