All of lore.kernel.org
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-block@nongnu.org
Cc: kwolf@redhat.com, jsnow@redhat.com, qemu-devel@nongnu.org,
	ehabkost@redhat.com, crosa@redhat.com
Subject: Re: [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix discard
Date: Fri, 12 Mar 2021 16:52:24 +0100	[thread overview]
Message-ID: <59e304ba-b6a1-8214-a243-01dd69d56732@redhat.com> (raw)
In-Reply-To: <5fdf28fa-5a67-e3ff-a592-dc6c9285d75a@virtuozzo.com>

On 12.03.21 16:24, Vladimir Sementsov-Ogievskiy wrote:
> 12.03.2021 18:10, Max Reitz wrote:
>> On 12.03.21 13:46, Vladimir Sementsov-Ogievskiy wrote:
>>> 12.03.2021 15:32, Vladimir Sementsov-Ogievskiy wrote:
>>>> 12.03.2021 14:17, Max Reitz wrote:
>>>>> On 12.03.21 10:09, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> 11.03.2021 22:58, Max Reitz wrote:
>>>>>>> On 05.03.21 18:35, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>> There is a bug in qcow2: host cluster can be discarded (refcount
>>>>>>>> becomes 0) and reused during data write. In this case data write 
>>>>>>>> may
>>>>
>>>> [..]
>>>>
>>>>>>>> @@ -885,6 +1019,13 @@ static int QEMU_WARN_UNUSED_RESULT 
>>>>>>>> update_refcount(BlockDriverState *bs,
>>>>>>>>           if (refcount == 0) {
>>>>>>>>               void *table;
>>>>>>>> +            Qcow2InFlightRefcount *infl = find_infl_wr(s, 
>>>>>>>> cluster_index);
>>>>>>>> +
>>>>>>>> +            if (infl) {
>>>>>>>> +                infl->refcount_zero = true;
>>>>>>>> +                infl->type = type;
>>>>>>>> +                continue;
>>>>>>>> +            }
>>>>>>>
>>>>>>> I don’t understand what this is supposed to do exactly.  It seems 
>>>>>>> like it wants to keep metadata structures in the cache that are 
>>>>>>> still in use (because dropping them from the caches is what 
>>>>>>> happens next), but users of metadata structures won’t set 
>>>>>>> in-flight counters for those metadata structures, will they?
>>>>>>
>>>>>> Don't follow.
>>>>>>
>>>>>> We want the code in "if (refcount == 0)" to be triggered only when 
>>>>>> full reference count of the host cluster becomes 0, including 
>>>>>> inflight-write-cnt. So, if at this point inflight-write-cnt is not 
>>>>>> 0, we postpone freeing the host cluster, it will be done later 
>>>>>> from "slow path" in update_inflight_write_cnt().
>>>>>
>>>>> But the code under “if (refcount == 0)” doesn’t free anything, does 
>>>>> it?  All I can see is code to remove metadata structures from the 
>>>>> metadata caches (if the discarded cluster was an L2 table or a 
>>>>> refblock), and finally the discard on the underlying file.  I don’t 
>>>>> see how that protocol-level discard has anything to do with our 
>>>>> problem, though.
>>>>
>>>> Hmm. Still, if we do this discard, and then our in-flight write, 
>>>> we'll have data instead of a hole. Not a big deal, but seems better 
>>>> to postpone discard.
>>>>
>>>> On the other hand, clearing caches is OK, as its related only to 
>>>> qcow2-refcount, not to inflight-write-cnt
>>>>
>>>>>
>>>>> As far as I understand, the freeing happens immediately above the 
>>>>> “if (refcount == 0)” block by s->set_refcount() setting the 
>>>>> refcount to 0. (including updating s->free_cluster_index if the 
>>>>> refcount is 0).
>>>>
>>>> Hmm.. And that (setting s->free_cluster_index) what I should 
>>>> actually prevent until total reference count becomes zero.
>>>>
>>>> And about s->set_refcount(): it only update a refcount itself, and 
>>>> don't free anything.
>>>>
>>>>
>>>
>>> So, it is more correct like this:
>>>
>>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>>> index 464d133368..1da282446d 100644
>>> --- a/block/qcow2-refcount.c
>>> +++ b/block/qcow2-refcount.c
>>> @@ -1012,21 +1012,12 @@ static int QEMU_WARN_UNUSED_RESULT 
>>> update_refcount(BlockDriverState *bs,
>>>           } else {
>>>               refcount += addend;
>>>           }
>>> -        if (refcount == 0 && cluster_index < s->free_cluster_index) {
>>> -            s->free_cluster_index = cluster_index;
>>> -        }
>>>           s->set_refcount(refcount_block, block_index, refcount);
>>>
>>>           if (refcount == 0) {
>>>               void *table;
>>>               Qcow2InFlightRefcount *infl = find_infl_wr(s, 
>>> cluster_index);
>>>
>>> -            if (infl) {
>>> -                infl->refcount_zero = true;
>>> -                infl->type = type;
>>> -                continue;
>>> -            }
>>> -
>>>               table = 
>>> qcow2_cache_is_table_offset(s->refcount_block_cache,
>>>                                                   offset);
>>>               if (table != NULL) {
>>> @@ -1040,6 +1031,16 @@ static int QEMU_WARN_UNUSED_RESULT 
>>> update_refcount(BlockDriverState *bs,
>>>                   qcow2_cache_discard(s->l2_table_cache, table);
>>>               }
>>>
>>> +            if (infl) {
>>> +                infl->refcount_zero = true;
>>> +                infl->type = type;
>>> +                continue;
>>> +            }
>>> +
>>> +            if (cluster_index < s->free_cluster_index) {
>>> +                s->free_cluster_index = cluster_index;
>>> +            }
>>> +
>>>               if (s->discard_passthrough[type]) {
>>>                   update_refcount_discard(bs, cluster_offset, 
>>> s->cluster_size);
>>>               }
>>
>> I don’t think I like using s->free_cluster_index as a protection 
>> against allocating something before it.
> 
> Hmm, I just propose not to update it, if refcount reached 0 but we still 
> have inflight writes.
> 
> 
>>
>> First, it comes back the problem I just described in my mail from 
>> 15:58 GMT+1, which is that you’re changing the definition of what a 
>> free cluster is.  With this proposal, you’re proposing yet a new 
>> definition: A free cluster is anything with refcount == 0 after 
>> free_cluster_index.
> 
> I think that free cluster is anything with refcount = 0 and 
> inflight-write-cnt = 0.

Then, as I said in my other mail, update_refcount() just cannot free any 
cluster.  So changes to that function can’t be justified by preventing 
it from freeing clusters.

You need to clearly define what it is that update_refcount() should or 
shouldn’t do, and then we have to think about whether when all writes 
have settled, we really have to invoke qcow2_update_cluster_refcount() 
or whether we should do the small outstanding changes just directly in 
update_inflight_write_cnt().

I think this needs to be more formalized, or it doesn’t make sense.

For example, say we do define a free cluster to be refcount (RC) = 0 and 
inflight-write-cnt (IFWC) = 0.  Then everything that is done to a 
cluster because it is considered being freed right now because its RC 
drops to 0 must probably be changed to only be done if also its IFWC is 
0.  For example, we should only discard host clusters on the protocol 
layer if a cluster becomes free.  update_refcount() will no longer be 
able to free clusters with IFWC > 0, so it must never issue a 
protocol-level discard for them.  And, yes, it also shouldn’t adjust 
first_free_cluster_index, as you propose here.  (But you didn’t explain 
why, and it seems like it was just intuition to you instead of looking 
at it more formally.)

Instead, for clusters with RC = 0 and IFWC > 0, 
update_inflight_write_cnt() will take on the role of freeing them.  So 
now that function must adjust first_free_cluster_index and issue the 
protocol-level discard for such clusters.

I suppose in practice we could invoke qcow2_update_cluster_refcount() 
with -0, as you do, because now the cluster has RC = 0 and IFWC = 0, so 
now that function will be capable of freeing it.  But to me, that just 
looks like a bit of abuse.


I suppose we could create a new function qcow2_cluster_freed() where we 
collect everything that needs to be done once a cluster is considered 
freed (which so far was whenever its RC dropped to 0, which only happens 
in update_refcount(); and then will be whenever its RC and its IFWC drop 
to 0, which can happen in either update_refcount() or 
update_inflight_write_cnt()).  What would belong in there is discarding 
the cluster on the protocol level, and adjusting 
first_free_cluster_index.  (Perhaps more, I don’t know.)  With such a 
function, it would seem clear to me that there is no need to invoke 
qcow2_update_cluster_refcount() just to get precisely that effect.


(The alternative would be to keep RC == 0 the definition of a freed 
cluster.  Then we’d have to postpone the s->set_refcount() in 
update_refcount(), and update the refcount again in 
update_inflight_write_cnt(), but invoking 
qcow2_update_cluster_refcount().  We wouldn’t need to change the 
allocation functions.

I’m not saying that alternative is better – I don’t think it is, I think 
you’re right that the definition of a freed cluster should be changed. 
I’m just presenting it in contrast, to show when it would make sense to 
call qcow2_update_cluster_refcount().)

> And free_cluster_index is a hint where start to 
> search for such cluster.
> 
>>
>> Now looking only at the allocation functions, it may look like that 
>> kind of is the definition already.  But I don’t think that was the 
>> intention when free_cluster_index was introduced, so we’d have to 
>> check every place that sets free_cluster_index, to see whether it 
>> adheres to this definition.
>>
>> And I think it’s clear that there is a place that won’t adhere to this 
>> definition, and that is this very place here, in update_refcount().  
>> Say free_cluster_index is 42.  Then you free cluster 39, but there is 
>> a write to it, so free_cluster_index isn’t update.  Then you free 
>> cluster 38, and there are writes to that cluster, so 
>> free_cluster_index is updated to 38.  Suddenly, 39 is free to be 
>> allocated, too.
> 
> Why? 39 is protected by inflight-cnt, and we do has_infl_wr() check 
> together with refcount==0 check when allocate clusters.

I was (wrongly) assuming that with this change you’d drop the check in 
the allocation functions.

Max

>> (The precise problem is that with this new definition decreasing 
>> free_cluster_index suddenly has the power to free any cluster between 
>> its new and all value.  With the old definition, changing 
>> free_cluster_index would never free any cluster.  So when you decrease 
>> free_cluster_index, you suddenly have to be sure that all clusters 
>> between the new and old value that have refcount 0 are indeed to be 
>> considered free.)
>>
>> Max
>>
> 
> 



  reply	other threads:[~2021-03-12 15:59 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-05 17:35 [PATCH v3 0/6] qcow2: compressed write cache Vladimir Sementsov-Ogievskiy
2021-03-05 17:35 ` [PATCH v3 1/6] block-jobs: flush target at the end of .run() Vladimir Sementsov-Ogievskiy
2021-03-11 16:57   ` Max Reitz
2021-03-05 17:35 ` [PATCH v3 2/6] iotests: add qcow2-discard-during-rewrite Vladimir Sementsov-Ogievskiy
2021-03-05 17:35 ` [PATCH v3 3/6] block/qcow2: introduce inflight writes counters: fix discard Vladimir Sementsov-Ogievskiy
2021-03-11 19:58   ` Max Reitz
2021-03-12  9:09     ` Vladimir Sementsov-Ogievskiy
2021-03-12 11:17       ` Max Reitz
2021-03-12 12:32         ` Vladimir Sementsov-Ogievskiy
2021-03-12 12:42           ` Vladimir Sementsov-Ogievskiy
2021-03-12 15:01             ` Max Reitz
2021-03-12 12:46           ` Vladimir Sementsov-Ogievskiy
2021-03-12 15:10             ` Max Reitz
2021-03-12 15:24               ` Vladimir Sementsov-Ogievskiy
2021-03-12 15:52                 ` Max Reitz [this message]
2021-03-12 16:03                   ` Vladimir Sementsov-Ogievskiy
2021-03-12 14:58           ` Max Reitz
2021-03-12 15:39             ` Vladimir Sementsov-Ogievskiy
2021-03-05 17:35 ` [PATCH v3 4/6] util: implement seqcache Vladimir Sementsov-Ogievskiy
2021-03-12 13:41   ` Max Reitz
2021-03-12 14:37     ` Vladimir Sementsov-Ogievskiy
2021-03-12 15:13       ` Max Reitz
2021-06-04 14:31   ` Vladimir Sementsov-Ogievskiy
2021-03-05 17:35 ` [PATCH v3 5/6] block-coroutine-wrapper: allow non bdrv_ prefix Vladimir Sementsov-Ogievskiy
2021-03-12 16:53   ` Max Reitz
2021-03-05 17:35 ` [PATCH v3 6/6] block/qcow2: use seqcache for compressed writes Vladimir Sementsov-Ogievskiy
2021-03-12 18:15   ` Max Reitz
2021-03-12 18:43     ` Vladimir Sementsov-Ogievskiy
2021-03-15  9:58       ` Max Reitz
2021-03-15 14:40         ` Vladimir Sementsov-Ogievskiy
2021-03-16 12:25           ` Max Reitz
2021-03-16 17:48             ` Vladimir Sementsov-Ogievskiy
2021-03-17  8:09               ` Max Reitz
2021-03-12 18:45     ` Vladimir Sementsov-Ogievskiy
2021-03-29 20:18     ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59e304ba-b6a1-8214-a243-01dd69d56732@redhat.com \
    --to=mreitz@redhat.com \
    --cc=crosa@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.