* bdrv_co_pwritev: Assertion `!waited || !use_local_qiov' failed. @ 2019-12-17 14:14 Peter Lieven 2019-12-17 15:52 ` Kevin Wolf 0 siblings, 1 reply; 4+ messages in thread From: Peter Lieven @ 2019-12-17 14:14 UTC (permalink / raw) To: qemu block, qemu-devel; +Cc: vsementsov Hi all, I have a vserver running Qemu 4.0 that seems to reproducibly hit the following assertion: bdrv_co_pwritev: Assertion `!waited || !use_local_qiov' failed. I noticed that the padding code was recently reworked in commit 2e2ad02f2c. In the new code I cannot find a similar assertion. Was the assertion wrong or why was it dropped? I try to add some debugging code to find out what is exactly happing. Especially the requests that are in flight when the assertion is triggered. Thanks, Peter ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: bdrv_co_pwritev: Assertion `!waited || !use_local_qiov' failed. 2019-12-17 14:14 bdrv_co_pwritev: Assertion `!waited || !use_local_qiov' failed Peter Lieven @ 2019-12-17 15:52 ` Kevin Wolf 2019-12-17 16:38 ` Peter Lieven 0 siblings, 1 reply; 4+ messages in thread From: Kevin Wolf @ 2019-12-17 15:52 UTC (permalink / raw) To: Peter Lieven; +Cc: vsementsov, qemu-devel, qemu block Am 17.12.2019 um 15:14 hat Peter Lieven geschrieben: > I have a vserver running Qemu 4.0 that seems to reproducibly hit the > following assertion: > > bdrv_co_pwritev: Assertion `!waited || !use_local_qiov' failed. > > I noticed that the padding code was recently reworked in commit > 2e2ad02f2c. In the new code I cannot find a similar assertion. Was > the assertion wrong or why was it dropped? No, the assertion in the old version makes sense to me. The code goes basically like this: if (head unaligned) { /* Make sure no new conflicting request will be started */ mark_request_serialising() /* Wait if a conflicting request is already in flight */ wait_serialising_requests() adjust start of the request } if (tail unaligned) { /* Make sure no new conflicting request will be started */ mark_request_serialising() /* Wait if a conflicting request is already in flight. If we * already had an unaligned head, we already waited for * conflicting requests and no new requests may have been * started, so in this case this must be a no-op. */ wait_serialising_requests() adjust end of the request } If the assertion fails, we already waited in the "head unaligned" case, but a new request snuck in even though we marked this request as serialising, so the other request should have waited. This might mean that a wait_serialising_requests() is missing somewhere. > I try to add some debugging code to find out what is exactly happing. > Especially the requests that are in flight when the assertion is > triggered. You can just have a look at bs->tracked_requests in gdb (if you can trigger the bug in a debugging environment). However, at the time of the assertion it's too late, we have waited for the bad request to complete, so it's gone now. You probably need to tell bdrv_wait_serialising_requests() that it's not supposed to wait, so you can assert the condition already there. (In the new code, I think the situation is different because both head and tail are handled at once, so we don't even have two instances of mark_request_serialising() and wait_serialising_requests() any more. But if a bug existed previously, it probably still exists.) Kevin ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: bdrv_co_pwritev: Assertion `!waited || !use_local_qiov' failed. 2019-12-17 15:52 ` Kevin Wolf @ 2019-12-17 16:38 ` Peter Lieven 2019-12-18 9:05 ` Kevin Wolf 0 siblings, 1 reply; 4+ messages in thread From: Peter Lieven @ 2019-12-17 16:38 UTC (permalink / raw) To: Kevin Wolf; +Cc: vsementsov, qemu-devel, qemu block Am 17.12.19 um 16:52 schrieb Kevin Wolf: > Am 17.12.2019 um 15:14 hat Peter Lieven geschrieben: >> I have a vserver running Qemu 4.0 that seems to reproducibly hit the >> following assertion: >> >> bdrv_co_pwritev: Assertion `!waited || !use_local_qiov' failed. >> >> I noticed that the padding code was recently reworked in commit >> 2e2ad02f2c. In the new code I cannot find a similar assertion. Was >> the assertion wrong or why was it dropped? > No, the assertion in the old version makes sense to me. The code goes > basically like this: > > if (head unaligned) { > /* Make sure no new conflicting request will be started */ > mark_request_serialising() > > /* Wait if a conflicting request is already in flight */ > wait_serialising_requests() > > adjust start of the request > } > > if (tail unaligned) { > /* Make sure no new conflicting request will be started */ > mark_request_serialising() > > /* Wait if a conflicting request is already in flight. If we > * already had an unaligned head, we already waited for > * conflicting requests and no new requests may have been > * started, so in this case this must be a no-op. */ > wait_serialising_requests() > > adjust end of the request > } > > If the assertion fails, we already waited in the "head unaligned" case, > but a new request snuck in even though we marked this request as > serialising, so the other request should have waited. > > This might mean that a wait_serialising_requests() is missing somewhere. > >> I try to add some debugging code to find out what is exactly happing. >> Especially the requests that are in flight when the assertion is >> triggered. > You can just have a look at bs->tracked_requests in gdb (if you can > trigger the bug in a debugging environment). However, at the time of the > assertion it's too late, we have waited for the bad request to complete, > so it's gone now. You probably need to tell > bdrv_wait_serialising_requests() that it's not supposed to wait, so you > can assert the condition already there. > > (In the new code, I think the situation is different because both head > and tail are handled at once, so we don't even have two instances of > mark_request_serialising() and wait_serialising_requests() any more. But > if a bug existed previously, it probably still exists.) Thanks for your detailed explanation. This makes things a bit clearer. I will try reproduce and get a better insight what is happening. From a first look at the code I was curious that we modify req->overlap_bytes and req->overlap_offset without the bs->reqs_lock lock held. We also modify req->serializing. All three are accessed in wait_serializing_requests while we might modify them. Best, Peter ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: bdrv_co_pwritev: Assertion `!waited || !use_local_qiov' failed. 2019-12-17 16:38 ` Peter Lieven @ 2019-12-18 9:05 ` Kevin Wolf 0 siblings, 0 replies; 4+ messages in thread From: Kevin Wolf @ 2019-12-18 9:05 UTC (permalink / raw) To: Peter Lieven; +Cc: pbonzini, vsementsov, qemu-devel, qemu block Am 17.12.2019 um 17:38 hat Peter Lieven geschrieben: > Am 17.12.19 um 16:52 schrieb Kevin Wolf: > > Am 17.12.2019 um 15:14 hat Peter Lieven geschrieben: > > > I have a vserver running Qemu 4.0 that seems to reproducibly hit the > > > following assertion: > > > > > > bdrv_co_pwritev: Assertion `!waited || !use_local_qiov' failed. > > > > > > I noticed that the padding code was recently reworked in commit > > > 2e2ad02f2c. In the new code I cannot find a similar assertion. Was > > > the assertion wrong or why was it dropped? > > No, the assertion in the old version makes sense to me. The code goes > > basically like this: > > > > if (head unaligned) { > > /* Make sure no new conflicting request will be started */ > > mark_request_serialising() > > > > /* Wait if a conflicting request is already in flight */ > > wait_serialising_requests() > > > > adjust start of the request > > } > > > > if (tail unaligned) { > > /* Make sure no new conflicting request will be started */ > > mark_request_serialising() > > > > /* Wait if a conflicting request is already in flight. If we > > * already had an unaligned head, we already waited for > > * conflicting requests and no new requests may have been > > * started, so in this case this must be a no-op. */ > > wait_serialising_requests() > > > > adjust end of the request > > } > > > > If the assertion fails, we already waited in the "head unaligned" case, > > but a new request snuck in even though we marked this request as > > serialising, so the other request should have waited. > > > > This might mean that a wait_serialising_requests() is missing somewhere. > > > > > I try to add some debugging code to find out what is exactly happing. > > > Especially the requests that are in flight when the assertion is > > > triggered. > > You can just have a look at bs->tracked_requests in gdb (if you can > > trigger the bug in a debugging environment). However, at the time of the > > assertion it's too late, we have waited for the bad request to complete, > > so it's gone now. You probably need to tell > > bdrv_wait_serialising_requests() that it's not supposed to wait, so you > > can assert the condition already there. > > > > (In the new code, I think the situation is different because both head > > and tail are handled at once, so we don't even have two instances of > > mark_request_serialising() and wait_serialising_requests() any more. But > > if a bug existed previously, it probably still exists.) > > Thanks for your detailed explanation. This makes things a bit clearer. > I will try reproduce and get a better insight what is happening. > > From a first look at the code I was curious that we modify > req->overlap_bytes and req->overlap_offset without the bs->reqs_lock > lock held. We also modify req->serializing. All three are accessed in > wait_serializing_requests while we might modify them. Hm, that might be a latent bug. I don't think it would surface today because we're still protected by the AioContext lock, but once the last part of Paolo's multiqueue patches is merged, it might start to bite us. Kevin ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-12-18 9:06 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-12-17 14:14 bdrv_co_pwritev: Assertion `!waited || !use_local_qiov' failed Peter Lieven 2019-12-17 15:52 ` Kevin Wolf 2019-12-17 16:38 ` Peter Lieven 2019-12-18 9:05 ` Kevin Wolf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).