Am 17.12.2020 um 10:37 hat Sergio Lopez geschrieben: > On Wed, Dec 16, 2020 at 07:31:02PM +0100, Kevin Wolf wrote: > > Am 16.12.2020 um 15:55 hat Sergio Lopez geschrieben: > > > On Wed, Dec 16, 2020 at 01:35:14PM +0100, Kevin Wolf wrote: > > > > Am 15.12.2020 um 18:23 hat Sergio Lopez geschrieben: > > > > > On Tue, Dec 15, 2020 at 04:01:19PM +0100, Kevin Wolf wrote: > > > > > > Am 15.12.2020 um 14:15 hat Sergio Lopez geschrieben: > > > > > > > On Tue, Dec 15, 2020 at 01:12:33PM +0100, Kevin Wolf wrote: > > > > > > > > Am 14.12.2020 um 18:05 hat Sergio Lopez geschrieben: > > > > > > > > > While processing the parents of a BDS, one of the parents may process > > > > > > > > > the child that's doing the tail recursion, which leads to a BDS being > > > > > > > > > processed twice. This is especially problematic for the aio_notifiers, > > > > > > > > > as they might attempt to work on both the old and the new AIO > > > > > > > > > contexts. > > > > > > > > > > > > > > > > > > To avoid this, add the BDS pointer to the ignore list, and check the > > > > > > > > > child BDS pointer while iterating over the children. > > > > > > > > > > > > > > > > > > Signed-off-by: Sergio Lopez > > > > > > > > > > > > > > > > Ugh, so we get a mixed list of BdrvChild and BlockDriverState? :-/ > > > > > > > > > > > > > > I know, it's effective but quite ugly... > > > > > > > > > > > > > > > What is the specific scenario where you saw this breaking? Did you have > > > > > > > > multiple BdrvChild connections between two nodes so that we would go to > > > > > > > > the parent node through one and then come back to the child node through > > > > > > > > the other? > > > > > > > > > > > > > > I don't think this is a corner case. If the graph is walked top->down, > > > > > > > there's no problem since children are added to the ignore list before > > > > > > > getting processed, and siblings don't process each other. But, if the > > > > > > > graph is walked bottom->up, a BDS will start processing its parents > > > > > > > without adding itself to the ignore list, so there's nothing > > > > > > > preventing them from processing it again. > > > > > > > > > > > > I don't understand. child is added to ignore before calling the parent > > > > > > callback on it, so how can we come back through the same BdrvChild? > > > > > > > > > > > > QLIST_FOREACH(child, &bs->parents, next_parent) { > > > > > > if (g_slist_find(*ignore, child)) { > > > > > > continue; > > > > > > } > > > > > > assert(child->klass->set_aio_ctx); > > > > > > *ignore = g_slist_prepend(*ignore, child); > > > > > > child->klass->set_aio_ctx(child, new_context, ignore); > > > > > > } > > > > > > > > > > Perhaps I'm missing something, but the way I understand it, that loop > > > > > is adding the BdrvChild pointer of each of its parents, but not the > > > > > BdrvChild pointer of the BDS that was passed as an argument to > > > > > b_s_a_c_i. > > > > > > > > Generally, the caller has already done that. > > > > > > > > In the theoretical case that it was the outermost call in the recursion > > > > and it hasn't (I couldn't find any such case), I think we should still > > > > call the callback for the passed BdrvChild like we currently do. > > > > > > > > > > You didn't dump the BdrvChild here. I think that would add some > > > > > > information on why we re-entered 0x555ee2fbf660. Maybe you can also add > > > > > > bs->drv->format_name for each node to make the scenario less abstract? > > > > > > > > > > I've generated another trace with more data: > > > > > > > > > > bs=0x565505e48030 (backup-top) enter > > > > > bs=0x565505e48030 (backup-top) processing children > > > > > bs=0x565505e48030 (backup-top) calling bsaci child=0x565505e42090 (child->bs=0x565505e5d420) > > > > > bs=0x565505e5d420 (qcow2) enter > > > > > bs=0x565505e5d420 (qcow2) processing children > > > > > bs=0x565505e5d420 (qcow2) calling bsaci child=0x565505e41ea0 (child->bs=0x565505e52060) > > > > > bs=0x565505e52060 (file) enter > > > > > bs=0x565505e52060 (file) processing children > > > > > bs=0x565505e52060 (file) processing parents > > > > > bs=0x565505e52060 (file) processing itself > > > > > bs=0x565505e5d420 (qcow2) processing parents > > > > > bs=0x565505e5d420 (qcow2) calling set_aio_ctx child=0x5655066a34d0 > > > > > bs=0x565505fbf660 (qcow2) enter > > > > > bs=0x565505fbf660 (qcow2) processing children > > > > > bs=0x565505fbf660 (qcow2) calling bsaci child=0x565505e41d20 (child->bs=0x565506bc0c00) > > > > > bs=0x565506bc0c00 (file) enter > > > > > bs=0x565506bc0c00 (file) processing children > > > > > bs=0x565506bc0c00 (file) processing parents > > > > > bs=0x565506bc0c00 (file) processing itself > > > > > bs=0x565505fbf660 (qcow2) processing parents > > > > > bs=0x565505fbf660 (qcow2) calling set_aio_ctx child=0x565505fc7aa0 > > > > > bs=0x565505fbf660 (qcow2) calling set_aio_ctx child=0x5655068b8510 > > > > > bs=0x565505e48030 (backup-top) enter > > > > > bs=0x565505e48030 (backup-top) processing children > > > > > bs=0x565505e48030 (backup-top) calling bsaci child=0x565505e3c450 (child->bs=0x565505fbf660) > > > > > bs=0x565505fbf660 (qcow2) enter > > > > > bs=0x565505fbf660 (qcow2) processing children > > > > > bs=0x565505fbf660 (qcow2) processing parents > > > > > bs=0x565505fbf660 (qcow2) processing itself > > > > > bs=0x565505e48030 (backup-top) processing parents > > > > > bs=0x565505e48030 (backup-top) calling set_aio_ctx child=0x565505e402d0 > > > > > bs=0x565505e48030 (backup-top) processing itself > > > > > bs=0x565505fbf660 (qcow2) processing itself > > > > > > > > Hm, is this complete? Is see no "processing itself" for > > > > bs=0x565505e5d420. Or is this because it crashed before getting there? > > > > > > Yes, it crashes there. I forgot to mention that, sorry. > > > > > > > Anyway, trying to reconstruct the block graph with BdrvChild pointers > > > > annotated at the edges: > > > > > > > > BlockBackend > > > > | > > > > v > > > > backup-top ------------------------+ > > > > | | | > > > > | +-----------------------+ | > > > > | 0x5655068b8510 | | 0x565505e3c450 > > > > | | | > > > > | 0x565505e42090 | | > > > > v | | > > > > qcow2 ---------------------+ | | > > > > | | | | > > > > | 0x565505e52060 | | | ??? [1] > > > > | | | | | > > > > v 0x5655066a34d0 | | | | 0x565505fc7aa0 > > > > file v v v v > > > > qcow2 (backing) > > > > | > > > > | 0x565505e41d20 > > > > v > > > > file > > > > > > > > [1] This seems to be a BdrvChild with a non-BDS parent. Probably a > > > > BdrvChild directly owned by the backup job. > > > > > > > > > So it seems this is happening: > > > > > > > > > > backup-top (5e48030) <---------| (5) > > > > > | | | > > > > > | | (6) ------------> qcow2 (5fbf660) > > > > > | ^ | > > > > > | (3) | | (4) > > > > > |-> (1) qcow2 (5e5d420) ----- |-> file (6bc0c00) > > > > > | > > > > > |-> (2) file (5e52060) > > > > > > > > > > backup-top (5e48030), the BDS that was passed as argument in the first > > > > > bdrv_set_aio_context_ignore() call, is re-entered when qcow2 (5fbf660) > > > > > is processing its parents, and the latter is also re-entered when the > > > > > first one starts processing its children again. > > > > > > > > Yes, but look at the BdrvChild pointers, it is through different edges > > > > that we come back to the same node. No BdrvChild is used twice. > > > > > > > > If backup-top had added all of its children to the ignore list before > > > > calling into the overlay qcow2, the backing qcow2 wouldn't eventually > > > > have called back into backup-top. > > > > > > I've tested a patch that first adds every child to the ignore list, > > > and then processes those that weren't there before, as you suggested > > > on a previous email. With that, the offending qcow2 is not re-entered, > > > so we avoid the crash, but backup-top is still entered twice: > > > > I think we also need to every parent to the ignore list before calling > > callbacks, though it doesn't look like this is the problem you're > > currently seeing. > > I agree. > > > > bs=0x560db0e3b030 (backup-top) enter > > > bs=0x560db0e3b030 (backup-top) processing children > > > bs=0x560db0e3b030 (backup-top) calling bsaci child=0x560db0e2f450 (child->bs=0x560db0fb2660) > > > bs=0x560db0fb2660 (qcow2) enter > > > bs=0x560db0fb2660 (qcow2) processing children > > > bs=0x560db0fb2660 (qcow2) calling bsaci child=0x560db0e34d20 (child->bs=0x560db1bb3c00) > > > bs=0x560db1bb3c00 (file) enter > > > bs=0x560db1bb3c00 (file) processing children > > > bs=0x560db1bb3c00 (file) processing parents > > > bs=0x560db1bb3c00 (file) processing itself > > > bs=0x560db0fb2660 (qcow2) calling bsaci child=0x560db16964d0 (child->bs=0x560db0e50420) > > > bs=0x560db0e50420 (qcow2) enter > > > bs=0x560db0e50420 (qcow2) processing children > > > bs=0x560db0e50420 (qcow2) calling bsaci child=0x560db0e34ea0 (child->bs=0x560db0e45060) > > > bs=0x560db0e45060 (file) enter > > > bs=0x560db0e45060 (file) processing children > > > bs=0x560db0e45060 (file) processing parents > > > bs=0x560db0e45060 (file) processing itself > > > bs=0x560db0e50420 (qcow2) processing parents > > > bs=0x560db0e50420 (qcow2) processing itself > > > bs=0x560db0fb2660 (qcow2) processing parents > > > bs=0x560db0fb2660 (qcow2) calling set_aio_ctx child=0x560db1672860 > > > bs=0x560db0fb2660 (qcow2) calling set_aio_ctx child=0x560db1b14a20 > > > bs=0x560db0e3b030 (backup-top) enter > > > bs=0x560db0e3b030 (backup-top) processing children > > > bs=0x560db0e3b030 (backup-top) processing parents > > > bs=0x560db0e3b030 (backup-top) calling set_aio_ctx child=0x560db0e332d0 > > > bs=0x560db0e3b030 (backup-top) processing itself > > > bs=0x560db0fb2660 (qcow2) processing itself > > > bs=0x560db0e3b030 (backup-top) calling bsaci child=0x560db0e35090 (child->bs=0x560db0e50420) > > > bs=0x560db0e50420 (qcow2) enter > > > bs=0x560db0e3b030 (backup-top) processing parents > > > bs=0x560db0e3b030 (backup-top) processing itself > > > > > > I see that "blk_do_set_aio_context()" passes "blk->root" to > > > "bdrv_child_try_set_aio_context()" so it's already in the ignore list, > > > so I'm not sure what's happening here. Is backup-top is referenced > > > from two different BdrvChild or is "blk->root" not pointing to > > > backup-top's BDS? > > > > The second time that backup-top is entered, it is not as the BDS of > > blk->root, but as the parent node of the overlay qcow2. Which is > > interesting, because last time it was still the backing qcow2, so the > > change did have _some_ effect. > > > > The part that I don't understand is why you still get the line with > > child=0x560db1b14a20, because when you add all children to the ignore > > list first, that should have been put into the ignore list as one of the > > first things in the whole process (when backup-top was first entered). > > > > Is 0x560db1b14a20 a BdrvChild that has backup-top as its opaque value, > > but isn't actually present in backup-top's bs->children? > > Exactly, that line corresponds to this chunk of code: > > <---- begin ----> > QLIST_FOREACH(child, &bs->parents, next_parent) { > if (g_slist_find(*ignore, child)) { > continue; > } > assert(child->klass->set_aio_ctx); > *ignore = g_slist_prepend(*ignore, child); > fprintf(stderr, "bs=%p (%s) calling set_aio_ctx child=%p\n", bs, bs->drv->format_name, child); > child->klass->set_aio_ctx(child, new_context, ignore); > } > <---- end ----> > > Do you think it's safe to re-enter backup-top, or should we look for a > way to avoid this? I think it should be avoided, but I don't understand why putting all children of backup-top into the ignore list doesn't already avoid it. If backup-top is in the parents list of qcow2, then qcow2 should be in the children list of backup-top and therefore the BdrvChild should already be in the ignore list. The only way I can explain this is that backup-top and qcow2 have different ideas about which BdrvChild objects exist that connect them. Or that the graph changes between both places, but I don't see how that could happen in bdrv_set_aio_context_ignore(). Kevin