qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-block@nongnu.org
Cc: kwolf@redhat.com, jsnow@redhat.com, qemu-devel@nongnu.org,
	ehabkost@redhat.com, crosa@redhat.com
Subject: Re: [PATCH v3 6/6] block/qcow2: use seqcache for compressed writes
Date: Mon, 15 Mar 2021 10:58:54 +0100	[thread overview]
Message-ID: <6056196d-a0cc-7de2-5d6f-b223fdee98ff@redhat.com> (raw)
In-Reply-To: <d5acfe9d-2095-a601-20b7-bd0b677df68a@virtuozzo.com>

On 12.03.21 19:43, Vladimir Sementsov-Ogievskiy wrote:
> 12.03.2021 21:15, Max Reitz wrote:
>> On 05.03.21 18:35, Vladimir Sementsov-Ogievskiy wrote:
>>> Compressed writes are unaligned to 512, which works very slow in
>>> O_DIRECT mode. Let's use the cache.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/coroutines.h     |   3 +
>>>   block/qcow2.h          |   4 ++
>>>   block/qcow2-refcount.c |  10 +++
>>>   block/qcow2.c          | 158 ++++++++++++++++++++++++++++++++++++++---
>>>   4 files changed, 164 insertions(+), 11 deletions(-)

[...]

>>> @@ -2699,6 +2796,12 @@ static void qcow2_close(BlockDriverState *bs)
>>>           qcow2_inactivate(bs);
>>>       }
>>> +    /*
>>> +     * Cache should be flushed in qcow2_inactivate() and should be 
>>> empty in
>>> +     * inactive mode. So we are safe to free it.
>>> +     */
>>> +    seqcache_free(s->compressed_cache);
>>> +
>>>       cache_clean_timer_del(bs);
>>>       qcow2_cache_destroy(s->l2_table_cache);
>>>       qcow2_cache_destroy(s->refcount_block_cache);
>>> @@ -4558,18 +4661,42 @@ 
>>> qcow2_co_pwritev_compressed_task(BlockDriverState *bs,
>>>           goto fail;
>>>       }
>>> -    qcow2_inflight_writes_inc(bs, cluster_offset, out_len);
>>> +    if (s->compressed_cache) {
>>
>> Why is this conditional?
> 
> We don't have compressed_cache for non o_direct.

Oh right.

>>> +        /*
>>> +         * It's important to do seqcache_write() in the same 
>>> critical section
>>> +         * (by s->lock) as qcow2_alloc_compressed_cluster_offset(), 
>>> so that the
>>> +         * cache is filled sequentially.
>>> +         */
>>
>> Yes.
>>
>>> +        seqcache_write(s->compressed_cache, cluster_offset, out_len, 
>>> out_buf);
>>> -    qemu_co_mutex_unlock(&s->lock);
>>> +        qemu_co_mutex_unlock(&s->lock);
>>> -    BLKDBG_EVENT(s->data_file, BLKDBG_WRITE_COMPRESSED);
>>> -    ret = bdrv_co_pwrite(s->data_file, cluster_offset, out_len, 
>>> out_buf, 0);
>>> +        ret = qcow2_co_compressed_flush_one(bs, false);
>>
>> The qcow2 doc says a compressed cluster can span multiple host 
>> clusters.  I don’t know whether that can happen with this driver, but 
>> if it does, wouldn’t that mean we’d need to flush two clusters here?  
>> Oh, no, never mind.  Only the first one would be finished and thus 
>> flushed, not the second one.
>>
>> I could have now removed the above paragraph, but it made me think, so 
>> I kept it:
>>
>> Hm.  Actually, if we unconditionally flush here, doesn’t that mean 
>> that we’ll never have a finished cluster in the cache for longer than 
>> the span between the seqcache_write() and this 
>> qcow2_co_compressed_flush_one()?  I.e., the 
>> qcow2_co_flush_compressed_cache() is supposed to never flush any 
>> finished cluster, but only the currently active unfinished cluster (if 
>> there is one), right?
> 
> Hmm. Maybe if we have parallel write and flush requests, it's a kind of 
> race condition: may be flush will flush both finished and unfinished 
> cluster, maybe write will flush the finished cluster and flush will 
> flush only unfinished one.. Moreover we may have several parallel 
> requests, so they make several finished clusters, and sudden flush will 
> flush them all.

OK.  I was mostly asking because I was wondering how much you expect the 
cache to be filled, i.e., how much you expect the read cache to help.

[...]

>>> @@ -4681,10 +4808,19 @@ qcow2_co_preadv_compressed(BlockDriverState *bs,
>>>       out_buf = qemu_blockalign(bs, s->cluster_size);
>>> -    BLKDBG_EVENT(bs->file, BLKDBG_READ_COMPRESSED);
>>> -    ret = bdrv_co_pread(bs->file, coffset, csize, buf, 0);
>>> -    if (ret < 0) {
>>> -        goto fail;
>>> +    /*
>>> +     * seqcache_read may return less bytes than csize, as csize may 
>>> exceed
>>> +     * actual compressed data size. So we are OK if seqcache_read 
>>> returns
>>> +     * something > 0.
>>
>> I was about to ask what happens when a compressed cluster spans two 
>> host clusters (I could have imagined that in theory the second one 
>> could have been discarded, but not the first one, so reading from the 
>> cache would really be short -- we would have needed to check that we 
>> only fell short in the range of 512 bytes, not more).
>>
>> But then I realized that in this version of the series, all finished 
>> clusters are immediately discarded and only the current unfinished one 
>> is kept.  Does it even make sense to try seqcache_read() here, then?
> 
> Hmm. Not immediately, but after flush. An flush is not under mutex. So 
> in theory at some moment we may have several finished clusters 
> "in-flight". And your question make sense. The cache supports reading 
> from consequitive clusters. But we also should support here reading one 
> part of data from disk and another from the cache to be safe.

The question is whether it really makes sense to even have a 
seqcache_read() path when in reality it’s probably never accessed.  I 
mean, besides the fact that it seems based purely on chance whether a 
read might fetch something from the cache even while we’re writing, in 
practice I don’t know any case where we’d write to and read from a 
compressed qcow2 image at the same time.  (I don’t know what you’re 
doing with the 'compress' filter, though.)

Max



  reply	other threads:[~2021-03-15 10:01 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-05 17:35 [PATCH v3 0/6] qcow2: compressed write cache Vladimir Sementsov-Ogievskiy
2021-03-05 17:35 ` [PATCH v3 1/6] block-jobs: flush target at the end of .run() Vladimir Sementsov-Ogievskiy
2021-03-11 16:57   ` Max Reitz
2021-03-05 17:35 ` [PATCH v3 2/6] iotests: add qcow2-discard-during-rewrite Vladimir Sementsov-Ogievskiy
2021-03-05 17:35 ` [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix discard Vladimir Sementsov-Ogievskiy
2021-03-11 19:58   ` Max Reitz
2021-03-12  9:09     ` Vladimir Sementsov-Ogievskiy
2021-03-12 11:17       ` Max Reitz
2021-03-12 12:32         ` Vladimir Sementsov-Ogievskiy
2021-03-12 12:42           ` Vladimir Sementsov-Ogievskiy
2021-03-12 15:01             ` Max Reitz
2021-03-12 12:46           ` Vladimir Sementsov-Ogievskiy
2021-03-12 15:10             ` Max Reitz
2021-03-12 15:24               ` Vladimir Sementsov-Ogievskiy
2021-03-12 15:52                 ` Max Reitz
2021-03-12 16:03                   ` Vladimir Sementsov-Ogievskiy
2021-03-12 14:58           ` Max Reitz
2021-03-12 15:39             ` Vladimir Sementsov-Ogievskiy
2021-03-05 17:35 ` [PATCH v3 4/6] util: implement seqcache Vladimir Sementsov-Ogievskiy
2021-03-12 13:41   ` Max Reitz
2021-03-12 14:37     ` Vladimir Sementsov-Ogievskiy
2021-03-12 15:13       ` Max Reitz
2021-06-04 14:31   ` Vladimir Sementsov-Ogievskiy
2021-03-05 17:35 ` [PATCH v3 5/6] block-coroutine-wrapper: allow non bdrv_ prefix Vladimir Sementsov-Ogievskiy
2021-03-12 16:53   ` Max Reitz
2021-03-05 17:35 ` [PATCH v3 6/6] block/qcow2: use seqcache for compressed writes Vladimir Sementsov-Ogievskiy
2021-03-12 18:15   ` Max Reitz
2021-03-12 18:43     ` Vladimir Sementsov-Ogievskiy
2021-03-15  9:58       ` Max Reitz [this message]
2021-03-15 14:40         ` Vladimir Sementsov-Ogievskiy
2021-03-16 12:25           ` Max Reitz
2021-03-16 17:48             ` Vladimir Sementsov-Ogievskiy
2021-03-17  8:09               ` Max Reitz
2021-03-12 18:45     ` Vladimir Sementsov-Ogievskiy
2021-03-29 20:18     ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6056196d-a0cc-7de2-5d6f-b223fdee98ff@redhat.com \
    --to=mreitz@redhat.com \
    --cc=crosa@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).