On 06.11.19 11:34, Wolfgang Bumiller wrote: > On Wed, Nov 06, 2019 at 10:37:04AM +0100, Max Reitz wrote: >> On 06.11.19 09:32, Stefan Hajnoczi wrote: >>> On Tue, Nov 05, 2019 at 11:02:44AM +0100, Dietmar Maurer wrote: >>>> Example: Backup from ceph disk (rbd_cache=false) to local disk: >>>> >>>> backup_calculate_cluster_size returns 64K (correct for my local .raw image) >>>> >>>> Then the backup job starts to read 64K blocks from ceph. >>>> >>>> But ceph always reads 4M block, so this is incredibly slow and produces >>>> way too much network traffic. >>>> >>>> Why does backup_calculate_cluster_size does not consider the block size from >>>> the source disk? >>>> >>>> cluster_size = MAX(block_size_source, block_size_target) >> >> So Ceph always transmits 4 MB over the network, no matter what is >> actually needed? That sounds, well, interesting. > > Or at least it generates that much I/O - in the end, it can slow down > the backup by up to a multi-digit factor... Oh, so I understand ceph internally resolves the 4 MB block and then transmits the subcluster range. That makes sense. >> backup_calculate_cluster_size() doesn’t consider the source size because >> to my knowledge there is no other medium that behaves this way. So I >> suppose the assumption was always that the block size of the source >> doesn’t matter, because a partial read is always possible (without >> having to read everything). > > Unless you enable qemu-side caching this only works until the > block/cluster size of the source exceeds the one of the target. > >> What would make sense to me is to increase the buffer size in general. >> I don’t think we need to copy clusters at a time, and >> 0e2402452f1f2042923a5 has indeed increased the copy size to 1 MB for >> backup writes that are triggered by guest writes. We haven’t yet >> increased the copy size for background writes, though. We can do that, >> of course. (And probably should.) >> >> The thing is, it just seems unnecessary to me to take the source cluster >> size into account in general. It seems weird that a medium only allows >> 4 MB reads, because, well, guests aren’t going to take that into account. > > But guests usually have a page cache, which is why in many setups qemu > (and thereby the backup process) often doesn't. But this still doesn’t make sense to me. Linux doesn’t issue 4 MB requests to pre-fill the page cache, does it? And if it issues a smaller request, there is no way for a guest device to tell it “OK, here’s your data, but note we have a whole 4 MB chunk around it, maybe you’d like to take that as well...?” I understand wanting to increase the backup buffer size, but I don’t quite understand why we’d want it to increase to the source cluster size when the guest also has no idea what the source cluster size is. Max