All of lore.kernel.org
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-block@nongnu.org
Cc: kwolf@redhat.com, jsnow@redhat.com, qemu-devel@nongnu.org,
	ehabkost@redhat.com, crosa@redhat.com
Subject: Re: [PATCH v3 6/6] block/qcow2: use seqcache for compressed writes
Date: Mon, 15 Mar 2021 10:58:54 +0100	[thread overview]
Message-ID: <6056196d-a0cc-7de2-5d6f-b223fdee98ff@redhat.com> (raw)
In-Reply-To: <d5acfe9d-2095-a601-20b7-bd0b677df68a@virtuozzo.com>

On 12.03.21 19:43, Vladimir Sementsov-Ogievskiy wrote:
> 12.03.2021 21:15, Max Reitz wrote:
>> On 05.03.21 18:35, Vladimir Sementsov-Ogievskiy wrote:
>>> Compressed writes are unaligned to 512, which works very slow in
>>> O_DIRECT mode. Let's use the cache.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/coroutines.h     |   3 +
>>>   block/qcow2.h          |   4 ++
>>>   block/qcow2-refcount.c |  10 +++
>>>   block/qcow2.c          | 158 ++++++++++++++++++++++++++++++++++++++---
>>>   4 files changed, 164 insertions(+), 11 deletions(-)

[...]

>>> @@ -2699,6 +2796,12 @@ static void qcow2_close(BlockDriverState *bs)
>>>           qcow2_inactivate(bs);
>>>       }
>>> +    /*
>>> +     * Cache should be flushed in qcow2_inactivate() and should be 
>>> empty in
>>> +     * inactive mode. So we are safe to free it.
>>> +     */
>>> +    seqcache_free(s->compressed_cache);
>>> +
>>>       cache_clean_timer_del(bs);
>>>       qcow2_cache_destroy(s->l2_table_cache);
>>>       qcow2_cache_destroy(s->refcount_block_cache);
>>> @@ -4558,18 +4661,42 @@ 
>>> qcow2_co_pwritev_compressed_task(BlockDriverState *bs,
>>>           goto fail;
>>>       }
>>> -    qcow2_inflight_writes_inc(bs, cluster_offset, out_len);
>>> +    if (s->compressed_cache) {
>>
>> Why is this conditional?
> 
> We don't have compressed_cache for non o_direct.

Oh right.

>>> +        /*
>>> +         * It's important to do seqcache_write() in the same 
>>> critical section
>>> +         * (by s->lock) as qcow2_alloc_compressed_cluster_offset(), 
>>> so that the
>>> +         * cache is filled sequentially.
>>> +         */
>>
>> Yes.
>>
>>> +        seqcache_write(s->compressed_cache, cluster_offset, out_len, 
>>> out_buf);
>>> -    qemu_co_mutex_unlock(&s->lock);
>>> +        qemu_co_mutex_unlock(&s->lock);
>>> -    BLKDBG_EVENT(s->data_file, BLKDBG_WRITE_COMPRESSED);
>>> -    ret = bdrv_co_pwrite(s->data_file, cluster_offset, out_len, 
>>> out_buf, 0);
>>> +        ret = qcow2_co_compressed_flush_one(bs, false);
>>
>> The qcow2 doc says a compressed cluster can span multiple host 
>> clusters.  I don’t know whether that can happen with this driver, but 
>> if it does, wouldn’t that mean we’d need to flush two clusters here?  
>> Oh, no, never mind.  Only the first one would be finished and thus 
>> flushed, not the second one.
>>
>> I could have now removed the above paragraph, but it made me think, so 
>> I kept it:
>>
>> Hm.  Actually, if we unconditionally flush here, doesn’t that mean 
>> that we’ll never have a finished cluster in the cache for longer than 
>> the span between the seqcache_write() and this 
>> qcow2_co_compressed_flush_one()?  I.e., the 
>> qcow2_co_flush_compressed_cache() is supposed to never flush any 
>> finished cluster, but only the currently active unfinished cluster (if 
>> there is one), right?
> 
> Hmm. Maybe if we have parallel write and flush requests, it's a kind of 
> race condition: may be flush will flush both finished and unfinished 
> cluster, maybe write will flush the finished cluster and flush will 
> flush only unfinished one.. Moreover we may have several parallel 
> requests, so they make several finished clusters, and sudden flush will 
> flush them all.

OK.  I was mostly asking because I was wondering how much you expect the 
cache to be filled, i.e., how much you expect the read cache to help.

[...]

>>> @@ -4681,10 +4808,19 @@ qcow2_co_preadv_compressed(BlockDriverState *bs,
>>>       out_buf = qemu_blockalign(bs, s->cluster_size);
>>> -    BLKDBG_EVENT(bs->file, BLKDBG_READ_COMPRESSED);
>>> -    ret = bdrv_co_pread(bs->file, coffset, csize, buf, 0);
>>> -    if (ret < 0) {
>>> -        goto fail;
>>> +    /*
>>> +     * seqcache_read may return less bytes than csize, as csize may 
>>> exceed
>>> +     * actual compressed data size. So we are OK if seqcache_read 
>>> returns
>>> +     * something > 0.
>>
>> I was about to ask what happens when a compressed cluster spans two 
>> host clusters (I could have imagined that in theory the second one 
>> could have been discarded, but not the first one, so reading from the 
>> cache would really be short -- we would have needed to check that we 
>> only fell short in the range of 512 bytes, not more).
>>
>> But then I realized that in this version of the series, all finished 
>> clusters are immediately discarded and only the current unfinished one 
>> is kept.  Does it even make sense to try seqcache_read() here, then?
> 
> Hmm. Not immediately, but after flush. An flush is not under mutex. So 
> in theory at some moment we may have several finished clusters 
> "in-flight". And your question make sense. The cache supports reading 
> from consequitive clusters. But we also should support here reading one 
> part of data from disk and another from the cache to be safe.

The question is whether it really makes sense to even have a 
seqcache_read() path when in reality it’s probably never accessed.  I 
mean, besides the fact that it seems based purely on chance whether a 
read might fetch something from the cache even while we’re writing, in 
practice I don’t know any case where we’d write to and read from a 
compressed qcow2 image at the same time.  (I don’t know what you’re 
doing with the 'compress' filter, though.)

Max



  reply	other threads:[~2021-03-15 10:01 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-05 17:35 [PATCH v3 0/6] qcow2: compressed write cache Vladimir Sementsov-Ogievskiy
2021-03-05 17:35 ` [PATCH v3 1/6] block-jobs: flush target at the end of .run() Vladimir Sementsov-Ogievskiy
2021-03-11 16:57   ` Max Reitz
2021-03-05 17:35 ` [PATCH v3 2/6] iotests: add qcow2-discard-during-rewrite Vladimir Sementsov-Ogievskiy
2021-03-05 17:35 ` [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix discard Vladimir Sementsov-Ogievskiy
2021-03-11 19:58   ` Max Reitz
2021-03-12  9:09     ` Vladimir Sementsov-Ogievskiy
2021-03-12 11:17       ` Max Reitz
2021-03-12 12:32         ` Vladimir Sementsov-Ogievskiy
2021-03-12 12:42           ` Vladimir Sementsov-Ogievskiy
2021-03-12 15:01             ` Max Reitz
2021-03-12 12:46           ` Vladimir Sementsov-Ogievskiy
2021-03-12 15:10             ` Max Reitz
2021-03-12 15:24               ` Vladimir Sementsov-Ogievskiy
2021-03-12 15:52                 ` Max Reitz
2021-03-12 16:03                   ` Vladimir Sementsov-Ogievskiy
2021-03-12 14:58           ` Max Reitz
2021-03-12 15:39             ` Vladimir Sementsov-Ogievskiy
2021-03-05 17:35 ` [PATCH v3 4/6] util: implement seqcache Vladimir Sementsov-Ogievskiy
2021-03-12 13:41   ` Max Reitz
2021-03-12 14:37     ` Vladimir Sementsov-Ogievskiy
2021-03-12 15:13       ` Max Reitz
2021-06-04 14:31   ` Vladimir Sementsov-Ogievskiy
2021-03-05 17:35 ` [PATCH v3 5/6] block-coroutine-wrapper: allow non bdrv_ prefix Vladimir Sementsov-Ogievskiy
2021-03-12 16:53   ` Max Reitz
2021-03-05 17:35 ` [PATCH v3 6/6] block/qcow2: use seqcache for compressed writes Vladimir Sementsov-Ogievskiy
2021-03-12 18:15   ` Max Reitz
2021-03-12 18:43     ` Vladimir Sementsov-Ogievskiy
2021-03-15  9:58       ` Max Reitz [this message]
2021-03-15 14:40         ` Vladimir Sementsov-Ogievskiy
2021-03-16 12:25           ` Max Reitz
2021-03-16 17:48             ` Vladimir Sementsov-Ogievskiy
2021-03-17  8:09               ` Max Reitz
2021-03-12 18:45     ` Vladimir Sementsov-Ogievskiy
2021-03-29 20:18     ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6056196d-a0cc-7de2-5d6f-b223fdee98ff@redhat.com \
    --to=mreitz@redhat.com \
    --cc=crosa@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.