On 2017-11-10 22:54, Eric Blake wrote: > On 11/10/2017 02:31 PM, Max Reitz wrote: >> Instead of using an assertion, it is better to emit a corruption event >> here. Checking all offsets for correct alignment can be tedious and it >> is easily possible to forget to do so. qcow2_cache_do_get() is a >> function every L2 and refblock access has to go through, so this is a >> good central point to add such a check. >> >> And for good measure, let us also add an assertion that the offset is >> non-zero. Making this a corruption event is not feasible, because a >> zero offset usually means something special (such as the cluster is >> unused), so all callers should be checking this anyway. If they do not, >> it is their fault, hence the assertion here. >> >> Signed-off-by: Max Reitz >> --- >> block/qcow2-cache.c | 21 +++++++++++++++++++++ >> tests/qemu-iotests/060 | 21 +++++++++++++++++++++ >> tests/qemu-iotests/060.out | 29 +++++++++++++++++++++++++++++ >> 3 files changed, 71 insertions(+) >> > >> +--- Repairing --- >> +Repairing refcount block 1 is outside image >> +ERROR refcount block 2 is not cluster aligned; refcount table entry corrupted >> +qcow2: Marking image as corrupt: Refblock offset 0x200 unaligned (reftable index: 0x2); further corruption events will be suppressed >> +Can't get refcount for cluster 1048576: Input/output error > > Trying to understand this: we have a double corruption, because we > encountered a refblock that points outside of the image, but fixing the > refblock in turn encounters a second refblock that points within the > image but to an unaligned area. No, it's the very same. As far as I've seen it, the repair function tries to fix the "refblock is outside image" error by resizing the image so the refblock is inside the image. However, the subsequent bdrv_truncate() detects the alignment corruption, too, and thus marks the image corrupt. The check function itself never marks the image corrupt because it's doing its best to fix it. :-) (And the only point in marking an image corrupt is to force the user to repair it.) And that's also the reason why we need to invoke the repair twice: On the first run the check function notices that the image is so broken we need to create new refcount structures, so it does that. But it cannot free the old structures (or something) because bs->drv == NULL by now. And then it cannot be run a second time because !bs->drv. So you need to manually invoke it a second time, and then it goes over the newly created refcount structures which are then fixed up. > Of course, you should never encounter these bad refblocks in normal > usage, but when it comes to dealing with untrusted images, being robust > is always worth it. > > Reviewed-by: Eric Blake Thanks! Max