On Thu, Apr 06, 2017 at 06:01:48PM +0300, Alberto Garcia wrote: > Here are the results (subcluster size in brackets): > > |-----------------+----------------+-----------------+-------------------| > | cluster size | subclusters=on | subclusters=off | Max L2 cache size | > |-----------------+----------------+-----------------+-------------------| > | 2 MB (256 KB) | 440 IOPS | 100 IOPS | 160 KB (*) | > | 512 KB (64 KB) | 1000 IOPS | 300 IOPS | 640 KB | > | 64 KB (8 KB) | 3000 IOPS | 1000 IOPS | 5 MB | > | 32 KB (4 KB) | 12000 IOPS | 1300 IOPS | 10 MB | > | 4 KB (512 B) | 100 IOPS | 100 IOPS | 80 MB | > |-----------------+----------------+-----------------+-------------------| > > (*) The L2 cache must be a multiple of the cluster > size, so in this case it must be 2MB. On the table > I chose to show how much of those 2MB are actually > used so you can compare it with the other cases. > > Some comments about the results: > > - For the 64KB, 512KB and 2MB cases, having subclusters increases > write performance roughly by three. This happens because for each > cluster allocation there's less data to copy from the backing > image. For the same reason, the smaller the cluster, the better the > performance. As expected, 64KB clusters with no subclusters perform > roughly the same as 512KB clusters with 64KB subclusters. > > - The 32KB case is the most interesting one. Without subclusters it's > not very different from the 64KB case, but having a subcluster with > the same size of the I/O block eliminates the need for COW entirely > and the performance skyrockets (10 times faster!). > > - 4KB is however very slow. I attribute this to the fact that the > cluster size is so small that a new cluster needs to be allocated > for every single write and its refcount updated accordingly. The L2 > and refcount tables are also so small that they are too inefficient > and need to grow all the time. > > Here are the results when writing to an empty 40GB qcow2 image with no > backing file. The numbers are of course different but as you can see > the patterns are similar: > > |-----------------+----------------+-----------------+-------------------| > | cluster size | subclusters=on | subclusters=off | Max L2 cache size | > |-----------------+----------------+-----------------+-------------------| > | 2 MB (256 KB) | 1200 IOPS | 255 IOPS | 160 KB | > | 512 KB (64 KB) | 3000 IOPS | 700 IOPS | 640 KB | > | 64 KB (8 KB) | 7200 IOPS | 3300 IOPS | 5 MB | > | 32 KB (4 KB) | 12300 IOPS | 4200 IOPS | 10 MB | > | 4 KB (512 B) | 100 IOPS | 100 IOPS | 80 MB | > |-----------------+----------------+-----------------+-------------------| I don't understand why subclusters=on performs so much better when there's no backing file. Is qcow2 zeroing out the 64 KB cluster with subclusters=off? It ought to just write the 4 KB data when a new cluster is touched. Therefore the performance should be very similar to subclusters=on. Stefan