Am 16.07.2019 um 18:24 hat Max Reitz geschrieben:
> On 16.07.19 16:40, Kevin Wolf wrote:
> > Am 19.06.2019 um 17:25 hat Max Reitz geschrieben:
> >> Hi,
> >>
> >> This is v2 to “block: Keep track of parent quiescing”.
> >>
> >> Please read this cover letter, because I’m very unsure about the design
> >> in this series and I’d appreciate some comments.
> >>
> >> As Kevin wrote in his reply to that series, the actual problem is that
> >> bdrv_drain_invoke() polls on every node whenever ending a drain.  This
> >> may cause graph changes, which is actually forbidden.
> >>
> >> To solve that problem, this series makes the drain code construct a list
> >> of undrain operations that have been initiated, and then polls all of
> >> them on the root level once graph changes are acceptable.
> >>
> >> Note that I don’t like this list concept very much, so I’m open to
> >> alternatives.
> > 
> > So drain_end is different from drain_begin in that it wants to wait only
> > for all bdrv_drain_invoke() calls to complete, but not for other
> > requests that are in flight. Makes sense.
> > 
> > Though instead of managing a whole list, wouldn't a counter suffice?
> > 
> >> Furthermore, all BdrvChildRoles with BDS parents have a broken
> >> .drained_end() implementation.  The documentation clearly states that
> >> this function is not allowed to poll, but it does.  So this series
> >> changes it to a variant (using the new code) that does not poll.
> >>
> >> There is a catch, which may actually be a problem, I don’t know: The new
> >> variant of that .drained_end() does not poll at all, never.  As
> >> described above, now every bdrv_drain_invoke() returns an object that
> >> describes when it will be done and which can thus be polled for.  These
> >> objects are just discarded when using BdrvChildRole.drained_end().  That
> >> does not feel quite right.  It would probably be more correct to let
> >> BdrvChildRole.drained_end() return these objects so the top level
> >> bdrv_drained_end() can poll for their completion.
> >>
> >> I decided not to do this, for two reasons:
> >> (1) Doing so would spill the “list of objects to poll for” design to
> >>     places outside of block/io.c.  I don’t like the design very much as
> >>     it is, but I can live with it as long as it’s constrained to the
> >>     core drain code in block/io.c.
> >>     This is made worse by the fact that currently, those objects are of
> >>     type BdrvCoDrainData.  But it shouldn’t be a problem to add a new
> >>     type that is externally visible (we only need the AioContext and
> >>     whether bdrv_drain_invoke_entry() is done).
> >>
> >> (2) It seems to work as it is.
> >>
> >> The alternative would be to add the same GSList ** parameter to
> >> BdrvChildRole.drained_end() that I added in the core drain code in patch
> >> 2, and then let the .drained_end() implementation fill that with objects
> >> to poll for.  (Which would be accomplished by adding a frontend to
> >> bdrv_do_drained_end() that lets bdrv_child_cb_drained_poll() pass the
> >> parameter through.)
> >>
> >> Opinions?
> > 
> > I think I would add an int* to BdrvChildRole.drained_end() so that we
> > can just increase the counter whereever we need to.
> 
> So you mean just polling the @bs for which a caller gave poll=true until
> the counter reaches 0?  I’ll try, sounds good (if I can get it to work).

Yes, that's what I have in mind.

We expect graph changes to happen during the polling, but I think the
caller is responsible for making sure that the top-level @bs stays
around, so we don't need to be extra careful here.

Also, bdrv_drain_invoke() is always called in the same AioContext as the
top-level drain operation, so the whole aio_context_acquire/release
stuff from this series should become unnecessary, and we don't need
atomics to access the counter either.

So I think this should really simplify the series a lot.

Kevin