On 04/11/2017 09:49 AM, Kevin Wolf wrote: >>>> Then (3) is effectively the same as (2), just that the subcluster >>>> bitmaps are at the end of the L2 cluster, and not next to each entry. >>> >>> Exactly. But it's a difference in implementation, as you won't have to >>> worry about having changed the L2 table layout; maybe that's a >>> benefit. >> >> I'm not sure if that would simplify or complicate things, but it's worth >> considering. > > Note that 64k between an L2 entry and the corresponding bitmap is enough > to make an update not atomic any more. They need to be within the same > sector to get atomicity. Furthermore, there is a benefit to cache line packing - alternating 64-bits for offset and 64-bits for subclusters will fit all 128 bits in the same cache line, while having all offsets up front followed by all subclusters later is not. Worse, depending on architecture, if the 64-bits for the offset is at the same relative offset to overall cache alignment as the 64-bits for the subcluster (for example, with 1M clusters, if the offset is at 4M and the subcluster info is at 5M), the alignments of the separate memory pages may cause you to end up with both values competing for the same cache line, causing ping-pong evictions and associated slowdowns. Data locality really does want to locate stuff that is commonly used together to also appear close together in memory. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org