On 29.10.19 12:48, Vladimir Sementsov-Ogievskiy wrote: > 29.10.2019 11:50, Max Reitz wrote: >> On 28.10.19 12:25, Vladimir Sementsov-Ogievskiy wrote: >>> 28.10.2019 14:04, Kevin Wolf wrote: >>>> Am 27.10.2019 um 13:35 hat Stefan Hajnoczi geschrieben: >>>>> On Fri, Oct 25, 2019 at 11:58:46AM +0200, Max Reitz wrote: >> >> [...] >> >>>>>> (3) Drop handle_alloc_space(), i.e. revert c8bb23cbdbe32f. >>>>>> To my knowledge I’m the only one who has provided any benchmarks for >>>>>> this commit, and even then I was a bit skeptical because it performs >>>>>> well in some cases and bad in others. I concluded that it’s >>>>>> probably worth it because the “some cases” are more likely to occur. >>>>>> >>>>>> Now we have this problem of corruption here (granted due to a bug in >>>>>> the XFS driver), and another report of massively degraded >>>>>> performance on ppc64 >>>>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1745823 – sorry, a >>>>>> private BZ; I hate that :-/ The report is about 40 % worse >>>>>> performance for an in-guest fio write benchmark.) >>>>>> >>>>>> So I have to ask the question about what the justification for >>>>>> keeping c8bb23cbdbe32f is. How much does performance increase with >>>>>> it actually? (On non-(ppc64+XFS) machines, obviously) >>>>>> >>>>>> Advantages: >>>>>> + Trivial >>>>>> + No layering violations >>>>>> + We wouldn’t need to keep track of whether the kernel bug has been >>>>>> fixed or not >>>>>> + Fixes the ppc64+XFS performance problem >>>>>> >>>>>> Disadvantages: >>>>>> - Reverts cluster allocation performance to pre-c8bb23cbdbe32f >>>>>> levels, whatever that means >>>>> >>>>> My favorite because it is clean and simple, but Vladimir has a valid >>>>> use-case for requiring this performance optimization so reverting isn't >>>>> an option. >>>> >>>> Vladimir also said that qcow2 subclusters would probably also solve his >>>> problem, so maybe reverting and applying the subcluster patches instead >>>> is a possible solution, too? >>> >>> I'm not sure about ssd case, it may need write-zero optimization anyway. >> >> What exactly do you need? Do you actually need to write zeroes (e.g. >> because you’re storing images on block devices) or would it be >> sufficient to just drop the COW areas when bdrv_has_zero_init() and >> bdrv_has_zero_init_truncate() are true? > > Hmm, what do you mean? We need to zero COW areas. So, original way is to > write real zeroes, optimized way is fallocate.. What do you mean by drop? > Mark sublusters as zeroes by metadata? Why do you need to zero COW areas? For normal files, any data will read as zero if you didn’t write anything there. > But still we'll have COW areas in subcluster, and we'll need to directly zero > them.. And fallocate will most probably be faster on ssd ext4 case.. > >> >> I’m asking because Dave Chinner said >> (https://bugzilla.redhat.com/show_bug.cgi?id=1765547#c7) that >> fallocate() is always slow at least with aio=native because it needs to >> wait for all concurrent AIO writes to finish, and so it causes the AIO >> pipeline to stall. >> >> (He suggested using XFS extent size hints to get the same effect as >> write-zeroes for free, basically, but I don’t know whether that’s really >> useful to us; as unallocated areas on XFS read back as zero anyway.) >> >>>> We already have some cases where the existing handle_alloc_space() >>>> causes performance to actually become worse, and serialising requests as >>>> a workaround isn't going to make performance any better. So even on >>>> these grounds, keeping commit c8bb23cbdbe32f is questionable. >>>> >>> >>> Can keeping handle_alloc_space under some config option be an option? >> >> Hm. A config option is weird if you’re the only one who’s going to >> enable it. But other than that I don’t have anything against it. >> > > It's just a bit easier for us to maintain config option, than out-of-tree patch. > On the other hand, it's not a real problem to maintain this one patch in separate. > It may return again to the tree, when XFS bug fixed. We’ll still have the problem that fallocate() must stall aio=native requests. Max