qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* bdrv_co_pwritev: Assertion `!waited || !use_local_qiov' failed.
@ 2019-12-17 14:14 Peter Lieven
  2019-12-17 15:52 ` Kevin Wolf
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Lieven @ 2019-12-17 14:14 UTC (permalink / raw)
  To: qemu block, qemu-devel; +Cc: vsementsov

Hi all,


I have a vserver running Qemu 4.0 that seems to reproducibly hit the following assertion:

  bdrv_co_pwritev: Assertion `!waited || !use_local_qiov' failed.


I noticed that the padding code was recently reworked in commit 2e2ad02f2c.

In the new code I cannot find a similar assertion. Was the assertion wrong or why was it dropped?


I try to add some debugging code to find out what is exactly happing. Especially the requests that

are in flight when the assertion is triggered.


Thanks,

Peter




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bdrv_co_pwritev: Assertion `!waited || !use_local_qiov' failed.
  2019-12-17 14:14 bdrv_co_pwritev: Assertion `!waited || !use_local_qiov' failed Peter Lieven
@ 2019-12-17 15:52 ` Kevin Wolf
  2019-12-17 16:38   ` Peter Lieven
  0 siblings, 1 reply; 4+ messages in thread
From: Kevin Wolf @ 2019-12-17 15:52 UTC (permalink / raw)
  To: Peter Lieven; +Cc: vsementsov, qemu-devel, qemu block

Am 17.12.2019 um 15:14 hat Peter Lieven geschrieben:
> I have a vserver running Qemu 4.0 that seems to reproducibly hit the
> following assertion:
> 
>  bdrv_co_pwritev: Assertion `!waited || !use_local_qiov' failed.
> 
> I noticed that the padding code was recently reworked in commit
> 2e2ad02f2c. In the new code I cannot find a similar assertion. Was
> the assertion wrong or why was it dropped?

No, the assertion in the old version makes sense to me. The code goes
basically like this:

    if (head unaligned) {
        /* Make sure no new conflicting request will be started */
        mark_request_serialising()

        /* Wait if a conflicting request is already in flight */
        wait_serialising_requests()

        adjust start of the request
    }

    if (tail unaligned) {
        /* Make sure no new conflicting request will be started */
        mark_request_serialising()

        /* Wait if a conflicting request is already in flight. If we
         * already had an unaligned head, we already waited for
         * conflicting requests and no new requests may have been
         * started, so in this case this must be a no-op. */
        wait_serialising_requests()

        adjust end of the request
    }

If the assertion fails, we already waited in the "head unaligned" case,
but a new request snuck in even though we marked this request as
serialising, so the other request should have waited.

This might mean that a wait_serialising_requests() is missing somewhere.

> I try to add some debugging code to find out what is exactly happing.
> Especially the requests that are in flight when the assertion is
> triggered.

You can just have a look at bs->tracked_requests in gdb (if you can
trigger the bug in a debugging environment). However, at the time of the
assertion it's too late, we have waited for the bad request to complete,
so it's gone now. You probably need to tell
bdrv_wait_serialising_requests() that it's not supposed to wait, so you
can assert the condition already there.

(In the new code, I think the situation is different because both head
and tail are handled at once, so we don't even have two instances of
mark_request_serialising() and wait_serialising_requests() any more. But
if a bug existed previously, it probably still exists.)

Kevin



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bdrv_co_pwritev: Assertion `!waited || !use_local_qiov' failed.
  2019-12-17 15:52 ` Kevin Wolf
@ 2019-12-17 16:38   ` Peter Lieven
  2019-12-18  9:05     ` Kevin Wolf
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Lieven @ 2019-12-17 16:38 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: vsementsov, qemu-devel, qemu block

Am 17.12.19 um 16:52 schrieb Kevin Wolf:
> Am 17.12.2019 um 15:14 hat Peter Lieven geschrieben:
>> I have a vserver running Qemu 4.0 that seems to reproducibly hit the
>> following assertion:
>>
>>   bdrv_co_pwritev: Assertion `!waited || !use_local_qiov' failed.
>>
>> I noticed that the padding code was recently reworked in commit
>> 2e2ad02f2c. In the new code I cannot find a similar assertion. Was
>> the assertion wrong or why was it dropped?
> No, the assertion in the old version makes sense to me. The code goes
> basically like this:
>
>      if (head unaligned) {
>          /* Make sure no new conflicting request will be started */
>          mark_request_serialising()
>
>          /* Wait if a conflicting request is already in flight */
>          wait_serialising_requests()
>
>          adjust start of the request
>      }
>
>      if (tail unaligned) {
>          /* Make sure no new conflicting request will be started */
>          mark_request_serialising()
>
>          /* Wait if a conflicting request is already in flight. If we
>           * already had an unaligned head, we already waited for
>           * conflicting requests and no new requests may have been
>           * started, so in this case this must be a no-op. */
>          wait_serialising_requests()
>
>          adjust end of the request
>      }
>
> If the assertion fails, we already waited in the "head unaligned" case,
> but a new request snuck in even though we marked this request as
> serialising, so the other request should have waited.
>
> This might mean that a wait_serialising_requests() is missing somewhere.
>
>> I try to add some debugging code to find out what is exactly happing.
>> Especially the requests that are in flight when the assertion is
>> triggered.
> You can just have a look at bs->tracked_requests in gdb (if you can
> trigger the bug in a debugging environment). However, at the time of the
> assertion it's too late, we have waited for the bad request to complete,
> so it's gone now. You probably need to tell
> bdrv_wait_serialising_requests() that it's not supposed to wait, so you
> can assert the condition already there.
>
> (In the new code, I think the situation is different because both head
> and tail are handled at once, so we don't even have two instances of
> mark_request_serialising() and wait_serialising_requests() any more. But
> if a bug existed previously, it probably still exists.)


Thanks for your detailed explanation. This makes things a bit clearer.

I will try reproduce and get a better insight what is happening.


 From a first look at the code I was curious that we modify req->overlap_bytes and req->overlap_offset

without the bs->reqs_lock lock held. We also modify req->serializing. All three are accessed in wait_serializing_requests

while we might modify them.


Best,

Peter





^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bdrv_co_pwritev: Assertion `!waited || !use_local_qiov' failed.
  2019-12-17 16:38   ` Peter Lieven
@ 2019-12-18  9:05     ` Kevin Wolf
  0 siblings, 0 replies; 4+ messages in thread
From: Kevin Wolf @ 2019-12-18  9:05 UTC (permalink / raw)
  To: Peter Lieven; +Cc: pbonzini, vsementsov, qemu-devel, qemu block

Am 17.12.2019 um 17:38 hat Peter Lieven geschrieben:
> Am 17.12.19 um 16:52 schrieb Kevin Wolf:
> > Am 17.12.2019 um 15:14 hat Peter Lieven geschrieben:
> > > I have a vserver running Qemu 4.0 that seems to reproducibly hit the
> > > following assertion:
> > > 
> > >   bdrv_co_pwritev: Assertion `!waited || !use_local_qiov' failed.
> > > 
> > > I noticed that the padding code was recently reworked in commit
> > > 2e2ad02f2c. In the new code I cannot find a similar assertion. Was
> > > the assertion wrong or why was it dropped?
> > No, the assertion in the old version makes sense to me. The code goes
> > basically like this:
> > 
> >      if (head unaligned) {
> >          /* Make sure no new conflicting request will be started */
> >          mark_request_serialising()
> > 
> >          /* Wait if a conflicting request is already in flight */
> >          wait_serialising_requests()
> > 
> >          adjust start of the request
> >      }
> > 
> >      if (tail unaligned) {
> >          /* Make sure no new conflicting request will be started */
> >          mark_request_serialising()
> > 
> >          /* Wait if a conflicting request is already in flight. If we
> >           * already had an unaligned head, we already waited for
> >           * conflicting requests and no new requests may have been
> >           * started, so in this case this must be a no-op. */
> >          wait_serialising_requests()
> > 
> >          adjust end of the request
> >      }
> > 
> > If the assertion fails, we already waited in the "head unaligned" case,
> > but a new request snuck in even though we marked this request as
> > serialising, so the other request should have waited.
> > 
> > This might mean that a wait_serialising_requests() is missing somewhere.
> > 
> > > I try to add some debugging code to find out what is exactly happing.
> > > Especially the requests that are in flight when the assertion is
> > > triggered.
> > You can just have a look at bs->tracked_requests in gdb (if you can
> > trigger the bug in a debugging environment). However, at the time of the
> > assertion it's too late, we have waited for the bad request to complete,
> > so it's gone now. You probably need to tell
> > bdrv_wait_serialising_requests() that it's not supposed to wait, so you
> > can assert the condition already there.
> > 
> > (In the new code, I think the situation is different because both head
> > and tail are handled at once, so we don't even have two instances of
> > mark_request_serialising() and wait_serialising_requests() any more. But
> > if a bug existed previously, it probably still exists.)
> 
> Thanks for your detailed explanation. This makes things a bit clearer.
> I will try reproduce and get a better insight what is happening.
> 
> From a first look at the code I was curious that we modify
> req->overlap_bytes and req->overlap_offset without the bs->reqs_lock
> lock held. We also modify req->serializing. All three are accessed in
> wait_serializing_requests while we might modify them.

Hm, that might be a latent bug. I don't think it would surface today
because we're still protected by the AioContext lock, but once the last
part of Paolo's multiqueue patches is merged, it might start to bite us.

Kevin



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-12-18  9:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-17 14:14 bdrv_co_pwritev: Assertion `!waited || !use_local_qiov' failed Peter Lieven
2019-12-17 15:52 ` Kevin Wolf
2019-12-17 16:38   ` Peter Lieven
2019-12-18  9:05     ` Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).