From: Max Reitz <mreitz@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
qemu-block@nongnu.org
Cc: kwolf@redhat.com, jsnow@redhat.com, qemu-devel@nongnu.org,
ehabkost@redhat.com, crosa@redhat.com
Subject: Re: [PATCH v3 6/6] block/qcow2: use seqcache for compressed writes
Date: Mon, 15 Mar 2021 10:58:54 +0100 [thread overview]
Message-ID: <6056196d-a0cc-7de2-5d6f-b223fdee98ff@redhat.com> (raw)
In-Reply-To: <d5acfe9d-2095-a601-20b7-bd0b677df68a@virtuozzo.com>
On 12.03.21 19:43, Vladimir Sementsov-Ogievskiy wrote:
> 12.03.2021 21:15, Max Reitz wrote:
>> On 05.03.21 18:35, Vladimir Sementsov-Ogievskiy wrote:
>>> Compressed writes are unaligned to 512, which works very slow in
>>> O_DIRECT mode. Let's use the cache.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>> block/coroutines.h | 3 +
>>> block/qcow2.h | 4 ++
>>> block/qcow2-refcount.c | 10 +++
>>> block/qcow2.c | 158 ++++++++++++++++++++++++++++++++++++++---
>>> 4 files changed, 164 insertions(+), 11 deletions(-)
[...]
>>> @@ -2699,6 +2796,12 @@ static void qcow2_close(BlockDriverState *bs)
>>> qcow2_inactivate(bs);
>>> }
>>> + /*
>>> + * Cache should be flushed in qcow2_inactivate() and should be
>>> empty in
>>> + * inactive mode. So we are safe to free it.
>>> + */
>>> + seqcache_free(s->compressed_cache);
>>> +
>>> cache_clean_timer_del(bs);
>>> qcow2_cache_destroy(s->l2_table_cache);
>>> qcow2_cache_destroy(s->refcount_block_cache);
>>> @@ -4558,18 +4661,42 @@
>>> qcow2_co_pwritev_compressed_task(BlockDriverState *bs,
>>> goto fail;
>>> }
>>> - qcow2_inflight_writes_inc(bs, cluster_offset, out_len);
>>> + if (s->compressed_cache) {
>>
>> Why is this conditional?
>
> We don't have compressed_cache for non o_direct.
Oh right.
>>> + /*
>>> + * It's important to do seqcache_write() in the same
>>> critical section
>>> + * (by s->lock) as qcow2_alloc_compressed_cluster_offset(),
>>> so that the
>>> + * cache is filled sequentially.
>>> + */
>>
>> Yes.
>>
>>> + seqcache_write(s->compressed_cache, cluster_offset, out_len,
>>> out_buf);
>>> - qemu_co_mutex_unlock(&s->lock);
>>> + qemu_co_mutex_unlock(&s->lock);
>>> - BLKDBG_EVENT(s->data_file, BLKDBG_WRITE_COMPRESSED);
>>> - ret = bdrv_co_pwrite(s->data_file, cluster_offset, out_len,
>>> out_buf, 0);
>>> + ret = qcow2_co_compressed_flush_one(bs, false);
>>
>> The qcow2 doc says a compressed cluster can span multiple host
>> clusters. I don’t know whether that can happen with this driver, but
>> if it does, wouldn’t that mean we’d need to flush two clusters here?
>> Oh, no, never mind. Only the first one would be finished and thus
>> flushed, not the second one.
>>
>> I could have now removed the above paragraph, but it made me think, so
>> I kept it:
>>
>> Hm. Actually, if we unconditionally flush here, doesn’t that mean
>> that we’ll never have a finished cluster in the cache for longer than
>> the span between the seqcache_write() and this
>> qcow2_co_compressed_flush_one()? I.e., the
>> qcow2_co_flush_compressed_cache() is supposed to never flush any
>> finished cluster, but only the currently active unfinished cluster (if
>> there is one), right?
>
> Hmm. Maybe if we have parallel write and flush requests, it's a kind of
> race condition: may be flush will flush both finished and unfinished
> cluster, maybe write will flush the finished cluster and flush will
> flush only unfinished one.. Moreover we may have several parallel
> requests, so they make several finished clusters, and sudden flush will
> flush them all.
OK. I was mostly asking because I was wondering how much you expect the
cache to be filled, i.e., how much you expect the read cache to help.
[...]
>>> @@ -4681,10 +4808,19 @@ qcow2_co_preadv_compressed(BlockDriverState *bs,
>>> out_buf = qemu_blockalign(bs, s->cluster_size);
>>> - BLKDBG_EVENT(bs->file, BLKDBG_READ_COMPRESSED);
>>> - ret = bdrv_co_pread(bs->file, coffset, csize, buf, 0);
>>> - if (ret < 0) {
>>> - goto fail;
>>> + /*
>>> + * seqcache_read may return less bytes than csize, as csize may
>>> exceed
>>> + * actual compressed data size. So we are OK if seqcache_read
>>> returns
>>> + * something > 0.
>>
>> I was about to ask what happens when a compressed cluster spans two
>> host clusters (I could have imagined that in theory the second one
>> could have been discarded, but not the first one, so reading from the
>> cache would really be short -- we would have needed to check that we
>> only fell short in the range of 512 bytes, not more).
>>
>> But then I realized that in this version of the series, all finished
>> clusters are immediately discarded and only the current unfinished one
>> is kept. Does it even make sense to try seqcache_read() here, then?
>
> Hmm. Not immediately, but after flush. An flush is not under mutex. So
> in theory at some moment we may have several finished clusters
> "in-flight". And your question make sense. The cache supports reading
> from consequitive clusters. But we also should support here reading one
> part of data from disk and another from the cache to be safe.
The question is whether it really makes sense to even have a
seqcache_read() path when in reality it’s probably never accessed. I
mean, besides the fact that it seems based purely on chance whether a
read might fetch something from the cache even while we’re writing, in
practice I don’t know any case where we’d write to and read from a
compressed qcow2 image at the same time. (I don’t know what you’re
doing with the 'compress' filter, though.)
Max
next prev parent reply other threads:[~2021-03-15 10:01 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-05 17:35 [PATCH v3 0/6] qcow2: compressed write cache Vladimir Sementsov-Ogievskiy
2021-03-05 17:35 ` [PATCH v3 1/6] block-jobs: flush target at the end of .run() Vladimir Sementsov-Ogievskiy
2021-03-11 16:57 ` Max Reitz
2021-03-05 17:35 ` [PATCH v3 2/6] iotests: add qcow2-discard-during-rewrite Vladimir Sementsov-Ogievskiy
2021-03-05 17:35 ` [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix discard Vladimir Sementsov-Ogievskiy
2021-03-11 19:58 ` Max Reitz
2021-03-12 9:09 ` Vladimir Sementsov-Ogievskiy
2021-03-12 11:17 ` Max Reitz
2021-03-12 12:32 ` Vladimir Sementsov-Ogievskiy
2021-03-12 12:42 ` Vladimir Sementsov-Ogievskiy
2021-03-12 15:01 ` Max Reitz
2021-03-12 12:46 ` Vladimir Sementsov-Ogievskiy
2021-03-12 15:10 ` Max Reitz
2021-03-12 15:24 ` Vladimir Sementsov-Ogievskiy
2021-03-12 15:52 ` Max Reitz
2021-03-12 16:03 ` Vladimir Sementsov-Ogievskiy
2021-03-12 14:58 ` Max Reitz
2021-03-12 15:39 ` Vladimir Sementsov-Ogievskiy
2021-03-05 17:35 ` [PATCH v3 4/6] util: implement seqcache Vladimir Sementsov-Ogievskiy
2021-03-12 13:41 ` Max Reitz
2021-03-12 14:37 ` Vladimir Sementsov-Ogievskiy
2021-03-12 15:13 ` Max Reitz
2021-06-04 14:31 ` Vladimir Sementsov-Ogievskiy
2021-03-05 17:35 ` [PATCH v3 5/6] block-coroutine-wrapper: allow non bdrv_ prefix Vladimir Sementsov-Ogievskiy
2021-03-12 16:53 ` Max Reitz
2021-03-05 17:35 ` [PATCH v3 6/6] block/qcow2: use seqcache for compressed writes Vladimir Sementsov-Ogievskiy
2021-03-12 18:15 ` Max Reitz
2021-03-12 18:43 ` Vladimir Sementsov-Ogievskiy
2021-03-15 9:58 ` Max Reitz [this message]
2021-03-15 14:40 ` Vladimir Sementsov-Ogievskiy
2021-03-16 12:25 ` Max Reitz
2021-03-16 17:48 ` Vladimir Sementsov-Ogievskiy
2021-03-17 8:09 ` Max Reitz
2021-03-12 18:45 ` Vladimir Sementsov-Ogievskiy
2021-03-29 20:18 ` Vladimir Sementsov-Ogievskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6056196d-a0cc-7de2-5d6f-b223fdee98ff@redhat.com \
--to=mreitz@redhat.com \
--cc=crosa@redhat.com \
--cc=ehabkost@redhat.com \
--cc=jsnow@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).