[PATCH 0/1] Fix qcow2 corruption after addition of subcluster support

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/1] Fix qcow2 corruption after addition of subcluster support
@ 2020-11-23 15:49 Maxim Levitsky
  2020-11-23 15:49 ` [PATCH 1/1] Fix qcow2 corruption on discard Maxim Levitsky
  0 siblings, 1 reply; 12+ messages in thread
From: Maxim Levitsky @ 2020-11-23 15:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, vsementsov, Alberto Garcia, qemu-block, zhang_youjia,
	Max Reitz, andrey.shinkevich, Maxim Levitsky

On this weekend, I had discovered that one of my VMs started to act weird.

Due to this, I found out that it and most of the other VMs I have,
have grown an qcow2 corruption.

So after some bisecting, digging through dumps, and debugging,
I think I found the root cause and a fix.

In addition to that I would like to raise few points:

1. I had to use qcow2-dump from (*)
 (it is also on github but without source. wierd...)
 to examine the L1/L2 tables and refcount tables.

 It seems that there were few attempts (**), (***) to make an official tool that
 would dump at least L1/L2/refcount tables, but nothing got accepted
 so far.

 I think that an official tool to dump at least basic qcow2 structure
 would be very helpful to discover/debug qcow2 corruptions.
 I had to study again the qcow2 format for this, so I can help with that.

2. 'qemu-img check -r all' is happy to create clusters that are referenced
 from multiple L2 entries.

 This isn't technically wrong, since write through any of these l2 entries
 will COW the cluster.

 However I would be happy to know that my images don't have such clusters,
 so I would like qemu-img check to at least notify about this.
 Can we add some -check-weird-but-legal flag to it to check this?

Few notes about the condition for this corruption to occur:

I have a bunch of VMs which are running each using two qcow2 files,
base and a snapshot on top of it, which I 'qemu-img commit' once in a while.
Discard is enabled to avoid wasting disk space.

Since discard is enabled, 'qemu-img commit' often discards data on the base disk.
The corruption happens after such a commit, and manifests in a stale L2
entry that was supposed to be discarded but now points to an unused cluster.

I wasn't able to reproduce this on small test case so far.

Best regards,
    Maxim Levitsky

(*)https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02760.html
(**) https://patchwork.kernel.org/project/qemu-devel/patch/20180328133845.20632-1-berto@igalia.com/
(***) https://patchwork.kernel.org/project/qemu-devel/cover/1578990137-308222-1-git-send-email-andrey.shinkevich@virtuozzo.com/

Maxim Levitsky (1):
  Fix qcow2 corruption on discard

 block/qcow2-cluster.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.26.2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/1] Fix qcow2 corruption on discard
  2020-11-23 15:49 [PATCH 0/1] Fix qcow2 corruption after addition of subcluster support Maxim Levitsky
@ 2020-11-23 15:49 ` Maxim Levitsky
  2020-11-23 16:09   ` Kevin Wolf
  2020-11-23 17:38   ` Kevin Wolf
  0 siblings, 2 replies; 12+ messages in thread
From: Maxim Levitsky @ 2020-11-23 15:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, vsementsov, Alberto Garcia, qemu-block, zhang_youjia,
	Max Reitz, andrey.shinkevich, Maxim Levitsky

Commit 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
introduced a subtle change to code in zero_in_l2_slice:

It swapped the order of

1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);

To

1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);

It seems harmless, however the call to qcow2_free_any_clusters
can trigger a cache flush which can mark the L2 table as clean,
and assuming that this was the last write to it,
a stale version of it will remain on the disk.

Now we have a valid L2 entry pointing to a freed cluster. Oops.

Fixes: 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 block/qcow2-cluster.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 485b4cb92e..267b46a4ca 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -2010,11 +2010,11 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
             continue;
         }
 
-        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
         if (unmap) {
             qcow2_free_any_cluster(bs, old_l2_entry, QCOW2_DISCARD_REQUEST);
         }
         set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry);
+        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
         if (has_subclusters(s)) {
             set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
         }
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] Fix qcow2 corruption on discard
  2020-11-23 15:49 ` [PATCH 1/1] Fix qcow2 corruption on discard Maxim Levitsky
@ 2020-11-23 16:09   ` Kevin Wolf
  2020-11-23 18:20     ` Alberto Garcia
  2020-11-23 17:38   ` Kevin Wolf
  1 sibling, 1 reply; 12+ messages in thread
From: Kevin Wolf @ 2020-11-23 16:09 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: vsementsov, Alberto Garcia, qemu-block, zhang_youjia, qemu-devel,
	Max Reitz, andrey.shinkevich

Am 23.11.2020 um 16:49 hat Maxim Levitsky geschrieben:
> Commit 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
> introduced a subtle change to code in zero_in_l2_slice:
> 
> It swapped the order of
> 
> 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> 2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
> 3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
> 
> To
> 
> 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> 2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
> 3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
> 
> It seems harmless, however the call to qcow2_free_any_clusters
> can trigger a cache flush which can mark the L2 table as clean,
> and assuming that this was the last write to it,
> a stale version of it will remain on the disk.
> 
> Now we have a valid L2 entry pointing to a freed cluster. Oops.
> 
> Fixes: 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  block/qcow2-cluster.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 485b4cb92e..267b46a4ca 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -2010,11 +2010,11 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>              continue;
>          }
>  
> -        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
>          if (unmap) {
>              qcow2_free_any_cluster(bs, old_l2_entry, QCOW2_DISCARD_REQUEST);
>          }
>          set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry);
> +        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);

Good catch, but I think your order is wrong, too. We need the original
order from before 205fa50750:

1. qcow2_cache_entry_mark_dirty()
   set_l2_entry() + set_l2_bitmap()

2. qcow2_free_any_cluster()

The order between qcow2_cache_entry_mark_dirty() and set_l2_entry()
shouldn't matter, but it's important that we update the refcount table
only after the L2 table has been updated to not reference the cluster
any more.

Otherwise a crash could lead to a situation where the cluster is
allocated (because it has refcount 0), but it was still in use in an L2
table. This is a classic corruption scenario.

Kevin



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] Fix qcow2 corruption on discard
  2020-11-23 16:09   ` Kevin Wolf
@ 2020-11-23 18:20     ` Alberto Garcia
  2020-11-23 18:23       ` Maxim Levitsky
  0 siblings, 1 reply; 12+ messages in thread
From: Alberto Garcia @ 2020-11-23 18:20 UTC (permalink / raw)
  To: Kevin Wolf, Maxim Levitsky
  Cc: vsementsov, qemu-block, zhang_youjia, qemu-devel, Max Reitz,
	andrey.shinkevich

On Mon 23 Nov 2020 05:09:41 PM CET, Kevin Wolf wrote:
>> Commit 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
>> introduced a subtle change to code in zero_in_l2_slice:
>> 
>> It swapped the order of
>> 
>> 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
>> 2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
>> 3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
>> 
>> To
>> 
>> 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
>> 2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
>> 3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);

Ouch :( Good catch!

>> -        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
>>          if (unmap) {
>>              qcow2_free_any_cluster(bs, old_l2_entry, QCOW2_DISCARD_REQUEST);
>>          }
>>          set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry);
>> +        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
>
> Good catch, but I think your order is wrong, too. We need the original
> order from before 205fa50750:
>
> 1. qcow2_cache_entry_mark_dirty()
>    set_l2_entry() + set_l2_bitmap()
>
> 2. qcow2_free_any_cluster()

I agree with Kevin on this.

Berto


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] Fix qcow2 corruption on discard
  2020-11-23 18:20     ` Alberto Garcia
@ 2020-11-23 18:23       ` Maxim Levitsky
  0 siblings, 0 replies; 12+ messages in thread
From: Maxim Levitsky @ 2020-11-23 18:23 UTC (permalink / raw)
  To: Alberto Garcia, Kevin Wolf
  Cc: vsementsov, qemu-block, zhang_youjia, qemu-devel, Max Reitz,
	andrey.shinkevich

On Mon, 2020-11-23 at 19:20 +0100, Alberto Garcia wrote:
> On Mon 23 Nov 2020 05:09:41 PM CET, Kevin Wolf wrote:
> > > Commit 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
> > > introduced a subtle change to code in zero_in_l2_slice:
> > > 
> > > It swapped the order of
> > > 
> > > 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> > > 2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
> > > 3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
> > > 
> > > To
> > > 
> > > 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> > > 2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
> > > 3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
> 
> Ouch :( Good catch!
> 
> > > -        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> > >          if (unmap) {
> > >              qcow2_free_any_cluster(bs, old_l2_entry, QCOW2_DISCARD_REQUEST);
> > >          }
> > >          set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry);
> > > +        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> > 
> > Good catch, but I think your order is wrong, too. We need the original
> > order from before 205fa50750:
> > 
> > 1. qcow2_cache_entry_mark_dirty()
> >    set_l2_entry() + set_l2_bitmap()
> > 
> > 2. qcow2_free_any_cluster()
> 
> I agree with Kevin on this.

I also agree, I haven't thought about this.

Best regards,
	Maxim Levitsky
> 
> Berto
> 




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] Fix qcow2 corruption on discard
  2020-11-23 15:49 ` [PATCH 1/1] Fix qcow2 corruption on discard Maxim Levitsky
  2020-11-23 16:09   ` Kevin Wolf
@ 2020-11-23 17:38   ` Kevin Wolf
  2020-11-23 18:11     ` Maxim Levitsky
  1 sibling, 1 reply; 12+ messages in thread
From: Kevin Wolf @ 2020-11-23 17:38 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: vsementsov, Alberto Garcia, qemu-block, zhang_youjia, qemu-devel,
	Max Reitz, andrey.shinkevich

Am 23.11.2020 um 16:49 hat Maxim Levitsky geschrieben:
> Commit 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
> introduced a subtle change to code in zero_in_l2_slice:
> 
> It swapped the order of
> 
> 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> 2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
> 3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
> 
> To
> 
> 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> 2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
> 3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
> 
> It seems harmless, however the call to qcow2_free_any_clusters
> can trigger a cache flush which can mark the L2 table as clean,
> and assuming that this was the last write to it,
> a stale version of it will remain on the disk.

Do you have more details on this last paragraph? I'm trying to come up
with a reproducer, but I don't see how qcow2_free_any_clusters() could
flush the L2 table cache. (It's easy to get it to flush the refcount
block cache, but that's useless for a reproducer.)

The only way I see to flush any cache with it is in update_refcount()
the qcow2_cache_set_dependency() call. This will always flush the cache
that the L2 cache depends on - which will never be the L2 cache itself,
but always either the refcount cache or nothing.

There are more options in alloc_refcount_block() if we're allocating a
new refcount block, but in the context of freeing clusters we'll never
need to do that.

Whatever I tried, at the end of zero_in_l2_slice(), I have a dirty L2
table and a dirty refcount block in the cache, with a dependency that
makes sure that the L2 table will be written out first.

If you don't have the information yet, can you try to debug your manual
reproducer a bit more to find out how this happens?

Kevin

> Now we have a valid L2 entry pointing to a freed cluster. Oops.
> 
> Fixes: 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  block/qcow2-cluster.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 485b4cb92e..267b46a4ca 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -2010,11 +2010,11 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>              continue;
>          }
>  
> -        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
>          if (unmap) {
>              qcow2_free_any_cluster(bs, old_l2_entry, QCOW2_DISCARD_REQUEST);
>          }
>          set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry);
> +        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
>          if (has_subclusters(s)) {
>              set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
>          }
> -- 
> 2.26.2
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] Fix qcow2 corruption on discard
  2020-11-23 17:38   ` Kevin Wolf
@ 2020-11-23 18:11     ` Maxim Levitsky
  2020-11-24  9:17       ` Kevin Wolf
  0 siblings, 1 reply; 12+ messages in thread
From: Maxim Levitsky @ 2020-11-23 18:11 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: vsementsov, Alberto Garcia, qemu-block, zhang_youjia, qemu-devel,
	Max Reitz, andrey.shinkevich

On Mon, 2020-11-23 at 18:38 +0100, Kevin Wolf wrote:
> Am 23.11.2020 um 16:49 hat Maxim Levitsky geschrieben:
> > Commit 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
> > introduced a subtle change to code in zero_in_l2_slice:
> > 
> > It swapped the order of
> > 
> > 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> > 2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
> > 3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
> > 
> > To
> > 
> > 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> > 2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
> > 3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
> > 
> > It seems harmless, however the call to qcow2_free_any_clusters
> > can trigger a cache flush which can mark the L2 table as clean,
> > and assuming that this was the last write to it,
> > a stale version of it will remain on the disk.
> 
> Do you have more details on this last paragraph? I'm trying to come up
> with a reproducer, but I don't see how qcow2_free_any_clusters() could
> flush the L2 table cache. (It's easy to get it to flush the refcount
> block cache, but that's useless for a reproducer.)
> 
> The only way I see to flush any cache with it is in update_refcount()
> the qcow2_cache_set_dependency() call. This will always flush the cache
> that the L2 cache depends on - which will never be the L2 cache itself,
> but always either the refcount cache or nothing.
> 
> There are more options in alloc_refcount_block() if we're allocating a
> new refcount block, but in the context of freeing clusters we'll never
> need to do that.
> 
> Whatever I tried, at the end of zero_in_l2_slice(), I have a dirty L2
> table and a dirty refcount block in the cache, with a dependency that
> makes sure that the L2 table will be written out first.
> 
> If you don't have the information yet, can you try to debug your manual
> reproducer a bit more to find out how this happens?
I'll do this tomorrow.
Best regards,
	Maxim Levitsky

> 
> Kevin
> 
> > Now we have a valid L2 entry pointing to a freed cluster. Oops.
> > 
> > Fixes: 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  block/qcow2-cluster.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> > index 485b4cb92e..267b46a4ca 100644
> > --- a/block/qcow2-cluster.c
> > +++ b/block/qcow2-cluster.c
> > @@ -2010,11 +2010,11 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
> >              continue;
> >          }
> >  
> > -        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> >          if (unmap) {
> >              qcow2_free_any_cluster(bs, old_l2_entry, QCOW2_DISCARD_REQUEST);
> >          }
> >          set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry);
> > +        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> >          if (has_subclusters(s)) {
> >              set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
> >          }
> > -- 
> > 2.26.2
> > 




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] Fix qcow2 corruption on discard
  2020-11-23 18:11     ` Maxim Levitsky
@ 2020-11-24  9:17       ` Kevin Wolf
  2020-11-24 18:59         ` Alberto Garcia
  0 siblings, 1 reply; 12+ messages in thread
From: Kevin Wolf @ 2020-11-24  9:17 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: vsementsov, Alberto Garcia, qemu-block, zhang_youjia, qemu-devel,
	Max Reitz, andrey.shinkevich

Am 23.11.2020 um 19:11 hat Maxim Levitsky geschrieben:
> On Mon, 2020-11-23 at 18:38 +0100, Kevin Wolf wrote:
> > Am 23.11.2020 um 16:49 hat Maxim Levitsky geschrieben:
> > > Commit 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
> > > introduced a subtle change to code in zero_in_l2_slice:
> > > 
> > > It swapped the order of
> > > 
> > > 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> > > 2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
> > > 3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
> > > 
> > > To
> > > 
> > > 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> > > 2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
> > > 3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
> > > 
> > > It seems harmless, however the call to qcow2_free_any_clusters
> > > can trigger a cache flush which can mark the L2 table as clean,
> > > and assuming that this was the last write to it,
> > > a stale version of it will remain on the disk.
> > 
> > Do you have more details on this last paragraph? I'm trying to come up
> > with a reproducer, but I don't see how qcow2_free_any_clusters() could
> > flush the L2 table cache. (It's easy to get it to flush the refcount
> > block cache, but that's useless for a reproducer.)
> > 
> > The only way I see to flush any cache with it is in update_refcount()
> > the qcow2_cache_set_dependency() call. This will always flush the cache
> > that the L2 cache depends on - which will never be the L2 cache itself,
> > but always either the refcount cache or nothing.
> > 
> > There are more options in alloc_refcount_block() if we're allocating a
> > new refcount block, but in the context of freeing clusters we'll never
> > need to do that.
> > 
> > Whatever I tried, at the end of zero_in_l2_slice(), I have a dirty L2
> > table and a dirty refcount block in the cache, with a dependency that
> > makes sure that the L2 table will be written out first.
> > 
> > If you don't have the information yet, can you try to debug your manual
> > reproducer a bit more to find out how this happens?
> I'll do this tomorrow.

As the last RC for 5.2 is today, I will send a v2 that changes the fix
to restore the original order.

We can then continue work to find a minimal reproducer and merge the
test case in the early 6.0 cycle.

Kevin



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] Fix qcow2 corruption on discard
  2020-11-24  9:17       ` Kevin Wolf
@ 2020-11-24 18:59         ` Alberto Garcia
  2020-11-24 18:59           ` Maxim Levitsky
  0 siblings, 1 reply; 12+ messages in thread
From: Alberto Garcia @ 2020-11-24 18:59 UTC (permalink / raw)
  To: Kevin Wolf, Maxim Levitsky
  Cc: vsementsov, qemu-block, zhang_youjia, qemu-devel, Max Reitz,
	andrey.shinkevich

On Tue 24 Nov 2020 10:17:23 AM CET, Kevin Wolf wrote:
> We can then continue work to find a minimal reproducer and merge the
> test case in the early 6.0 cycle.

I haven't been able to reproduce the problem yet, do you have any
findings?

Berto


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] Fix qcow2 corruption on discard
  2020-11-24 18:59         ` Alberto Garcia
@ 2020-11-24 18:59           ` Maxim Levitsky
  2020-11-24 19:44             ` Maxim Levitsky
  0 siblings, 1 reply; 12+ messages in thread
From: Maxim Levitsky @ 2020-11-24 18:59 UTC (permalink / raw)
  To: Alberto Garcia, Kevin Wolf
  Cc: vsementsov, qemu-block, zhang_youjia, qemu-devel, Max Reitz,
	andrey.shinkevich

On Tue, 2020-11-24 at 19:59 +0100, Alberto Garcia wrote:
> On Tue 24 Nov 2020 10:17:23 AM CET, Kevin Wolf wrote:
> > We can then continue work to find a minimal reproducer and merge the
> > test case in the early 6.0 cycle.
> 
> I haven't been able to reproduce the problem yet, do you have any
> findings?
> 
> Berto
> 

I have a working reproducer script. I'll send it in a hour or so.
Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] Fix qcow2 corruption on discard
  2020-11-24 18:59           ` Maxim Levitsky
@ 2020-11-24 19:44             ` Maxim Levitsky
  2020-11-25 16:49               ` Alberto Garcia
  0 siblings, 1 reply; 12+ messages in thread
From: Maxim Levitsky @ 2020-11-24 19:44 UTC (permalink / raw)
  To: Alberto Garcia, Kevin Wolf
  Cc: vsementsov, qemu-block, zhang_youjia, qemu-devel, Max Reitz,
	andrey.shinkevich

[-- Attachment #1: Type: text/plain, Size: 2830 bytes --]

On Tue, 2020-11-24 at 20:59 +0200, Maxim Levitsky wrote:
> On Tue, 2020-11-24 at 19:59 +0100, Alberto Garcia wrote:
> > On Tue 24 Nov 2020 10:17:23 AM CET, Kevin Wolf wrote:
> > > We can then continue work to find a minimal reproducer and merge the
> > > test case in the early 6.0 cycle.
> > 
> > I haven't been able to reproduce the problem yet, do you have any
> > findings?
> > 
> > Berto
> > 
> 
> I have a working reproducer script. I'll send it in a hour or so.
> Best regards,
> 	Maxim Levitsky

I have attached a minimal reproducer for this issue.
I can convert this to an iotest if you think that this is worth it.

So these are the exact conditions for the corruption to happen:

1. Image must have at least 5 refcount tables 
(1 more that default refcount table cache size, which is 4 by default)

2. IO pattern that populates the 4 entry refcount table cache fully:

 Easiest way to do it is to have 4 L2 entries populated in the base image,
 such as each entry references a physical cluster that is served by different
 refcount table.

 Then discard these entries in the snapshot, triggering discard in the
 base file during the commit, which will populate the refcount table cache.

4. A discard of a cluster that belongs to 5th refcount table (done in the
   exact same way as above discards).
   It should be done soon, before L2 cache flushed due to some unrelated
   IO.

   This triggers the corruption:

The call stack is:

2. qcow2_free_any_cluster->
	qcow2_free_clusters->
		update_refcount:

			//This sets dependency between flushing the refcount cache and l2 cache.
    			if (decrease)
        			qcow2_cache_set_dependency(bs, s->refcount_block_cache,s->l2_table_cache);

			ret = alloc_refcount_block(bs, cluster_index, &refcount_block);
				return load_refcount_block(bs, refcount_block_offset, refcount_block);
					return qcow2_cache_get(...
						qcow2_cache_do_get
							/* because of a cache miss, we have to evict an entry*/
							ret = qcow2_cache_entry_flush(bs, c, i);
							if (c->depends) {
								/* this flushes the L2 cache */
        							ret = qcow2_cache_flush_dependency(bs, c);
							}

I had attached a reproducer that works with almost any cluster size and refcount block size.
Cluster sizes below 4K don't work because commit which is done by the mirror job which works on 4K granularity,
and that results in it not doing any discards due to various alignment restrictions.

If I patch qemu to make mirror job work on 512B granularity, test reproduces for small clusters as well.

The reproducer creates a qcow2 image in the current directory and it needs about 11G for default parameters.
(64K cluster size, 16 bit refcounts).
For 4K cluster size and 64 bit refcounts, it needs only 11M.
(This can be changed by editing the script)

Best regards,
	Maxim Levitsky

[-- Attachment #2: reproducer.sh --]
[-- Type: application/x-shellscript, Size: 2088 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] Fix qcow2 corruption on discard
  2020-11-24 19:44             ` Maxim Levitsky
@ 2020-11-25 16:49               ` Alberto Garcia
  0 siblings, 0 replies; 12+ messages in thread
From: Alberto Garcia @ 2020-11-25 16:49 UTC (permalink / raw)
  To: Maxim Levitsky, Kevin Wolf
  Cc: vsementsov, qemu-block, zhang_youjia, qemu-devel, Max Reitz,
	andrey.shinkevich

On Tue 24 Nov 2020 08:44:00 PM CET, Maxim Levitsky wrote:
> On Tue, 2020-11-24 at 20:59 +0200, Maxim Levitsky wrote:
>> On Tue, 2020-11-24 at 19:59 +0100, Alberto Garcia wrote:
>> > On Tue 24 Nov 2020 10:17:23 AM CET, Kevin Wolf wrote:
>> > > We can then continue work to find a minimal reproducer and merge the
>> > > test case in the early 6.0 cycle.
>> > 
>> > I haven't been able to reproduce the problem yet, do you have any
>> > findings?
>> > 
>> > Berto
>> > 
>> 
>> I have a working reproducer script. I'll send it in a hour or so.
>> Best regards,
>> 	Maxim Levitsky
>
> I have attached a minimal reproducer for this issue.
> I can convert this to an iotest if you think that this is worth it.

I think it would be a good idea to have an iotest, I can also prepare if
you don't have time.

Thanks for the script!

Berto


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-11-25 16:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-23 15:49 [PATCH 0/1] Fix qcow2 corruption after addition of subcluster support Maxim Levitsky
2020-11-23 15:49 ` [PATCH 1/1] Fix qcow2 corruption on discard Maxim Levitsky
2020-11-23 16:09   ` Kevin Wolf
2020-11-23 18:20     ` Alberto Garcia
2020-11-23 18:23       ` Maxim Levitsky
2020-11-23 17:38   ` Kevin Wolf
2020-11-23 18:11     ` Maxim Levitsky
2020-11-24  9:17       ` Kevin Wolf
2020-11-24 18:59         ` Alberto Garcia
2020-11-24 18:59           ` Maxim Levitsky
2020-11-24 19:44             ` Maxim Levitsky
2020-11-25 16:49               ` Alberto Garcia

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.