On 04.02.20 20:03, Eric Blake wrote: > On 2/4/20 11:53 AM, Max Reitz wrote: >> On 31.01.20 18:44, Eric Blake wrote: >>> Having two slightly-different function names for related purposes is >>> unwieldy, especially since I envision adding yet another notion of >>> zero support in an upcoming patch.  It doesn't help that >>> bdrv_has_zero_init() is a misleading name (I originally thought that a >>> driver could only return 1 when opening an already-existing image >>> known to be all zeroes; but in reality many drivers always return 1 >>> because it only applies to a just-created image). >> >> I don’t find it misleading, I just find it meaningless, which then makes >> it open to interpretation (or maybe rather s/interpretation/wishful >> thinking/). >> >>> Refactor all uses >>> to instead have a single function that returns multiple bits of >>> information, with better naming and documentation. >> >> It doesn’t make sense to me.  How exactly is it unwieldy?  In the sense >> that we have to deal with multiple rather small implementation functions >> rather than a big one per driver?  Actually, multiple small functions >> sounds better to me – unless the three implementations share common code. > > Common code for dealing with encryption, backing files, and so on.  It > felt like I had a lot of code repetition when keeping functions separate. Well, I suppose “dealing with” means “if (encrypted || has_backing)”, so duplicating that doesn’t seem too bad. >> As for the callers, they only want a single flag out of the three, don’t >> they?  If so, it doesn’t really matter for them. > > The qemu-img.c caller in patch 10 checks ZERO_CREATE | ZERO_OPEN, so we > DO have situations of checking more than one bit, vs. needing two > function calls. Hm, OK. Not sure if that place would look worse with two function calls, but, well. >> In fact, I can imagine that drivers can trivially return >> BDRV_ZERO_TRUNCATE information (because the preallocation mode is >> fixed), whereas BDRV_ZERO_CREATE can be a bit more involved, and >> BDRV_ZERO_OPEN could take even more time because some (constant-time) >> inquiries have to be done. > > In looking at the rest of the series, drivers were either completely > trivial (in which case, declaring: > > .bdrv_has_zero_init = bdrv_has_zero_init_1, > .bdrv_has_zero_init_truncate = bdrv_has_zero_init_1, > > was a lot wordier than the new: > > .bdrv_known_zeroes = bdrv_known_zeroes_truncate, Not sure if that’s bad, though. > ), or completely spelled out but where both creation and truncation were > determined in the same amount of effort. Well, usually, the effort is minimal, but OK. >> And thus callers which just want the trivially obtainable >> BDRV_ZERO_TRUNCATE info have to wait for the BDRV_ZERO_OPEN inquiry, >> even though they don’t care about that flag. > > True, but only to a minor extent; and the documentation mentions that > the BDRV_ZERO_OPEN calculation MUST NOT be as expensive as a blind > block_status loop. So it must be less expensive than an arbitrarily complex loop. I think a single SEEK_DATA/HOLE call was something like O(n) on tmpfs? What I’m trying to say is that this is not a good limit and can mean anything. I do think this limit definition makes sense for callers that want to know about ZERO_OPEN. But I don’t know why we would have to let other callers wait, too. > Meanwhile, callers tend to only care about > bdrv_known_zeroes() right after opening an image or right before > resizing (not repeatedly during runtime); Hm, yes. I was thinking of parallels, but that only checks once in parallels_open(), so it’s OK. > and you also argued elsewhere > in this thread that it may be worth having the block layer cache > BDRV_ZERO_OPEN and update the cache on any write, I didn’t say the block layer, but it if makes sense. > at which point, the > expense in the driver callback really is a one-time call during > bdrv_co_open(). It definitely doesn’t make sense to me to do that call unconditionally in bdrv_co_open(). > And in that case, whether the one-time expense is done > via a single function call or via three driver callbacks, the amount of > work is the same; but the driver callback interface is easier if there > is only one callback (similar to how bdrv_unallocated_blocks_are_zero() > calls bdrv_get_info() only for bdi.unallocated_blocks_are_zero, even > though BlockDriverInfo tracks much more than that boolean). > > In fact, it may be worth consolidating known zeroes support into > BlockDriverInfo. I’m very skeptical of that. BDI already has the problem that it doesn’t know which of the information the caller actually wants and that it is sometimes used in a quasi-hot path. Maybe that means it is indeed time to incorporate it into BDI, but the caller should have a way of specifying what parts of BDI it actually needs and then drivers can skip anything that isn’t trivially obtainable that the caller doesn’t need. >> So I’d leave it as separate functions so drivers can feel free to have >> implementations for BDRV_ZERO_OPEN that take more than mere microseconds >> but that are more accurate. >> >> (Or maybe if you really want it to be a single functions, callers could >> pass the mask of flags they care about.  If all flags are trivially >> obtainable, the implementations would then simply create their result >> mask and & it with the caller-given mask.  For implementations where >> some branches could take a bit more time, those branches are only taken >> when the caller cares about the given flag.  But again, I don’t >> necessarily think having a single function is more easily handleable >> than three smaller ones.) > > Those are still viable options, but before I repaint the bikeshed along > those lines, I'd at least like a review of whether the overall idea of > having a notion of 'reads-all-zeroes' is indeed useful enough, > regardless of how we implement it as one vs. three driver callbacks. I’m as hesitant as ever to give a review that this notion is useful, because I haven’t seen a practical example yet where the problem isn’t the fact that NBD doesn’t have 64-bit write_zeroes support. So far, it looks to me like this notion is only really useful for cases where we expect a management layer on top of qemu anyway. And then I’m not sure that this new feature works reliably enough for such a management layer. (I’m not saying it isn’t useful. Again, intuitively it does seem useful. Intuition can be enough to merge a sufficiently simple series that doesn’t increase code complexity too much. But I’m still asking for actual practical examples, because that would make a better argument, of course.) Max