On 1/11/19 10:22 AM, Vladimir Sementsov-Ogievskiy wrote: >> Even a dumb most-recent use cache will speed this up: both the second >> and third queries above can be avoided because we know that both 0x40000 >> and 0x30000 the second query at 0x40000 can be skipped (0x40000 is >> between our most recent lseek at 0x20000 and hole at 0x10000) > > Is it correct just use results from previous iterations? In mirror source > is active and may change. If you keep a cache, you have to keep the cache up-to-date. Any writes to an area that is covered by the known-hole cache have to flush the cache, so that the next block status no longer sees a known-hole and ends up doing another lseek. Or, if the cache has enough state to track unknown/known-hole/known-data, then writes update the cache to be known-data, and future block status can skip the lseek by using the results of the cache. > >> >> Make the cache slightly larger, or use a bitmap with 2 bits per cluster >> (tracking unknown, known-data, known-hole), with proper flushing of the >> cache as we write to the image, or whatever, and we should automatically >> get some performance improvements by using fewer lseek() anywhere that >> we remember what previous lseek() already told us, with no knobs needed. >> > > So the cache should consider all writes and discards. And it is obviously > more difficult to implement it, than just don't call this lseek. And I > don't understand, why cache + lseek is better for the case when we don't > need nor the lseek neither the cache. Is this all to not add an option? > Also Kevin objects to caching lseek in parallel sub-thread. Keven objected to caching anything if the image has multiple writers, where an outside process could change the file allocation in between our reads. But multiple writers is rare - in fact, our image locking for qcow2 formats tries to prevent multiple writers. Having multiple threads within one process writing is fine, as long as they properly coordinate writes to the lseek cache so that readers never see a stale claim of a hole - although a stale claim of data is safe. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org