qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: John Snow <jsnow@redhat.com>
To: Eric Blake <eblake@redhat.com>,
	Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>
Cc: "kwolf@redhat.com" <kwolf@redhat.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	Peter Krempa <pkrempa@redhat.com>,
	"libvirt-list@redhat.com" <libvirt-list@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"mreitz@redhat.com" <mreitz@redhat.com>
Subject: Re: [PATCH for-4.2?] block/qcow2-bitmap: fix crash bug in qcow2_co_remove_persistent_dirty_bitmap
Date: Fri, 6 Dec 2019 14:02:29 -0500	[thread overview]
Message-ID: <8c7c5f50-1899-3457-e1bc-77d8fee87de7@redhat.com> (raw)
In-Reply-To: <36655f7a-7ea5-e80a-ebfd-5b19c90622c0@redhat.com>



On 12/6/19 9:36 AM, Eric Blake wrote:
> [adding in Peter Maydell, as there is now potential talk of other
> 4.2-worthy patches]
> 
> On 12/6/19 4:18 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 05.12.2019 23:16, John Snow wrote:
>>>
>>>
>>> On 12/5/19 3:09 PM, Eric Blake wrote:
>>>> On 12/5/19 1:30 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>>> Here is double bug:
>>>>>
>>>>> First, return error but not set errp. This may lead to:
>>>>> qmp block-dirty-bitmap-remove may report success when actually failed
>>>>>
>>>>> block-dirty-bitmap-remove used in a transaction will crash, as
>>>>> qmp_transaction will think that it returned success and will cal
>>>>
>>>> call
>>>>
>>>>> block_dirty_bitmap_remove_commit which will crash, as state->bitmap is
>>>>> NULL
>>>>>
>>>>> Second (like in anecdote), this case is not an error at all. As it is
>>>>> documented in the comment above bdrv_co_remove_persistent_dirty_bitmap
>>>>> definition, absence of bitmap is not an error, and similar case
>>>>> handled
>>>>> at start of qcow2_co_remove_persistent_dirty_bitmap, it returns 0 when
>>>>> there is no bitmaps at all..
>>>>
>>>> double .
>>>>
>>>>>
>>>>> But when there are some bitmaps, but not the requested one, it return
>>>>> error with errp unset.
>>>>>
>>>>> Fix that.
>>>>>
>>>>> Fixes: b56a1e31759b750
>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>> ---
>>>>>
>>>>> Hi all!
>>>>>
>>>>> Ohm, suddenly we faced this bug. It's a regression in 4.2. I'm very
>>>>> sorry for introducing it, and it sad that it's found so late..
>>>>>
>>>>> Personally, I think that this one worth rc5, as it makes new bitmap
>>>>> interfaces unusable. But the decision is yours.
>>>>>
>>>>> Last minute edit: hmm, actually, transaction action introduced in
>>>>> 4.2, so crash is not a regression, only broken
>>>>> block-dirty-bitmap-remove
>>>>> command is a regression... Maybe it's OK for stable.
>>>>
>>>> Libvirt REALLY wants to use transaction bitmap management (and require
>>>> qemu 4.2) for its incremental backup patches that Peter is almost ready
>>>> to merge in.  I'm trying to ascertain:
>>>>
>>>> When exactly can this bug hit?  Can you give a sequence of QMP commands
>>>> that will trigger it for easy reproduction?  Are there workarounds
>>>> (such
>>>> as checking that a bitmap exists prior to attempting to remove it)?  If
>>>> it does NOT get fixed in rc5, how will libvirt be able to probe whether
>>>> the fix is in place?
>>>>
>>>
>>> It looks like:
>>>
>>> - You need to have at least one bitmap
>>> - You need to use transactional remove
>>> - you need to misspell the bitmap name
>>> - The remove will fail (return -EINVAL) but doesn't set errp
>>> - The caller chokes on incomplete information, state->bitmap is NULL
>>
>>
>> No, that would be too simple, the thing is worse. Absolutele correct
>> removing is broken, without any misspelling
>>
>> Bug triggers when we are removing persistent bitmap that is not stored
>> yet in the image AND at least one (another) bitmap already stored in
>> the image. So, something like:
>>
>> 1. create persistent bitmap A
>> 2. shutdown vm  (bitmap A is synced)
>> 3. start vm
>> 4. create persistent bitmap B
>> 5. remove bitmap B - it fails (and crashes if in transaction)
>>
>> ====
>>
>> Hmm, workaround..
>>
>> I'm afraid that the only thing is not remove persistent bitmaps, which
>> were never synced to the image. So, instead the sequence above, we need
>>
>>
>> 1. create persistent bitmap A
>> 2. shutdown vm
>> 3. start vm
>> 4. create persistent bitmap B
>> 5. remember, that we want to remove bitmap B after vm shutdown
>> ...
>>    some other operations
>> ...
>> 6. vm shutdown
>> 7. start vm in stopped mode, and remove all bitmaps marked for removing
>> 8. stop vm
>>
>> But, I think that in real circumstances, vm shutdown is rare thing...
> 
> This is sounding a bit more serious. As I said earlier, it shouldn't
> delay 4.2 on its own, but if the fix is obvious (and other than
> comments, it is a single change from 'ret = -EINVAL' to 'ret = 0' which
> fixes a definite reproducible crash), I think it rises to the level of
> acceptable.
> 
> I've been so worried about the question of which release, that I don't
> know if I've previously offered:
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 

Oh, that is quite a bit more serious than I thought too.

Yeah, I want this in 4.2 if at all possible.

Reviewed-by: John Snow <jsnow@redhat.com>



  reply	other threads:[~2019-12-06 19:08 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-05 19:30 [PATCH for-4.2?] block/qcow2-bitmap: fix crash bug in qcow2_co_remove_persistent_dirty_bitmap Vladimir Sementsov-Ogievskiy
2019-12-05 19:39 ` John Snow
2019-12-05 20:09 ` Eric Blake
2019-12-05 20:16   ` John Snow
2019-12-05 21:53     ` Eric Blake
2019-12-05 22:00       ` John Snow
2019-12-06 10:18     ` Vladimir Sementsov-Ogievskiy
2019-12-06 14:36       ` Eric Blake
2019-12-06 19:02         ` John Snow [this message]
2019-12-06 19:48           ` Eric Blake
2019-12-06 14:29   ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8c7c5f50-1899-3457-e1bc-77d8fee87de7@redhat.com \
    --to=jsnow@redhat.com \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=libvirt-list@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=pkrempa@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).