Re: [Qemu-devel] [PATCH 1/3] qcow2: Check bs->drv in copy_sectors()

From: Max Reitz <mreitz@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
	Kevin Wolf <kwolf@redhat.com>, Laszlo Ersek <lersek@redhat.com>
Cc: qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 1/3] qcow2: Check bs->drv in copy_sectors()
Date: Tue, 11 Mar 2014 19:19:08 +0100	[thread overview]
Message-ID: <531F539C.2080800@redhat.com> (raw)
In-Reply-To: <531F17EF.9000500@redhat.com>

On 11.03.2014 15:04, Paolo Bonzini wrote:
> Il 11/03/2014 11:16, Kevin Wolf ha scritto:
>> Am 11.03.2014 um 00:16 hat Laszlo Ersek geschrieben:
>>> On 03/10/14 23:44, Max Reitz wrote:
>>>> Before dereferencing bs->drv for a call to its member bdrv_co_readv(),
>>>> copy_sectors() should check whether that pointer is indeed valid, 
>>>> since
>>>> it may have been set to NULL by e.g. a concurrent write triggering the
>>>> corruption prevention mechanism.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>> To be precise, this still is a race condition. If bs->drv is set to 
>>>> NULL
>>>> after the check and before the call to bdrv_co_readv(), QEMU will
>>>> obviously still crash. However, in order to circumvent this 
>>>> behavior, we
>>>> would probably have to re-lock s->lock, check bs->drv, take the 
>>>> function
>>>> pointer to bdrv_co_readv() and then unlock s->lock before the function
>>>> is called. I found this rather ugly and therefore this still has a 
>>>> very
>>>> small chance of running into a race condition.
>>>> Therefore, I'm asking for your opinion on this, whether we can really
>>>> take this chance or should rather "do it right". In fact, if I were a
>>>> reviewer, I'd probably reject this patch and request the solution with
>>>> the function pointer (if there is no better solution), but I was 
>>>> afraid
>>>> to send such an ugly patch.
>>
>> No, the code is fine. Remember that qcow2 is not threaded, we're talking
>> about coroutines here. There is no way for the code to yield between
>> your check and the protected place.
>>
>>>>  block/qcow2-cluster.c | 4 ++++
>>>>  1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>>>> index 36c1bed..9499df9 100644
>>>> --- a/block/qcow2-cluster.c
>>>> +++ b/block/qcow2-cluster.c
>>>> @@ -380,6 +380,10 @@ static int coroutine_fn 
>>>> copy_sectors(BlockDriverState *bs,
>>>>
>>>>      BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
>>>>
>>>> +    if (!bs->drv) {
>>>> +        return -ENOMEDIUM;
>>>> +    }
>>>> +
>>>>      /* Call .bdrv_co_readv() directly instead of using the public 
>>>> block-layer
>>>>       * interface.  This avoids double I/O throttling and request 
>>>> tracking,
>>>>       * which can lead to deadlock when block layer copy-on-read is 
>>>> enabled.
>>>>
>>>
>>> I can't answer your question nor review this patch -- instead, I have a
>>> question of my own: when you say "set to NULL by [...] the corruption
>>> prevention mechanism", do you mean qcow2_pre_write_overlap_check():
>>>
>>>         bs->drv = NULL; /* make BDS unusable */
>>
>> Yes, this is the place.
>>
>>> If so: I thought that it was quite a bold move, but also that we'd find
>>> the SIGSEGVs sooner or later... :)
>>
>> In fact, if you use the block layer API, most functions check for
>> bs->drv and return -ENOMEDIUM if it is NULL. The problem here is that we
>> directly dereference the pointer without going through block.c (there's
>> a good reason for this, see the comment, but it still makes it somewhat
>> special).
>
> But why not call qcow2_co_readv directly?

qcow2_co_readv() would probably actually perform the operation, as far 
as I can see, since it does not check whether the medium is corrupt 
(this is only checked when opening the block device). This is the reason 
why bs->drv is set to NULL, so noone is able to read from the block 
device anymore, even without checking for corruption (additionally, on 
version 2 images, there is not even a way to indicate corruption other 
than setting bs->drv to NULL).
Normally, every read operation would pass through bdrv_co_readv(), which 
will (have to) check whether bs->drv is valid or NULL - since this code 
decides to skip that and call the driver function directly, we therefore 
have to check bs->drv anyway, since it is the only way of telling us 
whether the image may actually be accessed right here. We could change 
the bs->drv->bdrv_co_readv() call to qcow2_co_readv(), but then people 
will probably ask why there is a bs->drv check right before.

Max