From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:36162)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <pl@kamp.de>)
	id 1XWMm6-0008J3-JQ
	for qemu-devel@nongnu.org; Tue, 23 Sep 2014 05:52:55 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pl@kamp.de>) id 1XWMlu-0006WT-Ct
	for qemu-devel@nongnu.org; Tue, 23 Sep 2014 05:52:46 -0400
Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:41874 helo=mx01.kamp.de)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <pl@kamp.de>)
	id 1XWMlu-0006VU-37
	for qemu-devel@nongnu.org; Tue, 23 Sep 2014 05:52:34 -0400
Message-ID: <542142D7.3040006@kamp.de>
Date: Tue, 23 Sep 2014 11:52:23 +0200
From: Peter Lieven <pl@kamp.de>
MIME-Version: 1.0
References: <1409935888-18552-1-git-send-email-pl@kamp.de>
	<1409935888-18552-5-git-send-email-pl@kamp.de>
	<541AE887.9050607@redhat.com>
	<B9522E44-7F3C-4954-8F53-A4DC214F408F@kamp.de>
	<541C3095.4040405@redhat.com> <541FEF3A.30909@kamp.de>
	<54207349.2000107@redhat.com> <54210FEB.50900@kamp.de>
	<20140923085917.GB3871@noname.str.redhat.com>
	<54213E3B.5040302@kamp.de>
	<20140923094729.GD3871@noname.str.redhat.com>
In-Reply-To: <20140923094729.GD3871@noname.str.redhat.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes
	in multiwrite_merge
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, ronniesahlberg@gmail.com, qemu-devel@nongnu.org, stefanha@redhat.com, mreitz@redhat.com

On 23.09.2014 11:47, Kevin Wolf wrote:
> Am 23.09.2014 um 11:32 hat Peter Lieven geschrieben:
>> On 23.09.2014 10:59, Kevin Wolf wrote:
>>> Am 23.09.2014 um 08:15 hat Peter Lieven geschrieben:
>>>> On 22.09.2014 21:06, Paolo Bonzini wrote:
>>>>> Il 22/09/2014 11:43, Peter Lieven ha scritto:
>>>>>> This series aims not at touching default behaviour. The default for max_transfer_length
>>>>>> is 0 (no limit). max_transfer_length is a limit that MUST be satisfied otherwise the request
>>>>>> will fail. And Patch 2 aims at catching this fail earlier in the stack.
>>>>> Understood.  But the right fix is to avoid that backend limits transpire
>>>>> into guest ABI, not to catch the limits earlier.  So the right fix would
>>>>> be to implement request splitting.
>>>> Since you proposed to add traces for this would you leave those in?
>>>> And since iSCSI is the only user of this at the moment would you
>>>> go for implementing this check in the iSCSI block layer?
>>>>
>>>> As for the split logic would you think it is enough to build new qiov's
>>>> out of the too big iov without copying the contents? This would work
>>>> as long as a single iov inside the qiov is not bigger the max_transfer_length.
>>>> Otherwise I would need to allocate temporary buffers and copy around.
>>> You can split single iovs, too. There are functions that make this very
>>> easy, they copy a sub-qiov with a byte granularity offset and length
>>> (qemu_iovec_concat and friends). qcow2 uses the same to split requests
>>> at (fragmented) cluster boundaries.
>> Might it be as easy as this?
> This is completely untested, right? :-)

Yes :-)
I was just unsure if the general approach is right.

>
> But ignoring bugs, the principle looks right.
>
>> static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
>>      int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
>>      BdrvRequestFlags flags)
>> {
>>      if (nb_sectors < 0 || nb_sectors > (UINT_MAX >> BDRV_SECTOR_BITS)) {
>>          return -EINVAL;
>>      }
>>
>>      if (bs->bl.max_transfer_length &&
>>          nb_sectors > bs->bl.max_transfer_length) {
>>          int ret = 0;
>>          QEMUIOVector *qiov2 = NULL;
> Make it "QEMUIOVector qiov2;" on the stack.
>
>>          size_t soffset = 0;
>>
>>          trace_bdrv_co_do_readv_toobig(bs, sector_num, nb_sectors,
>> bs->bl.max_transfer_length);
>>
>>          qemu_iovec_init(qiov2, qiov->niov);
> And &qiov2 here, then this doesn't crash with a NULL dereference.
>
>>          while (nb_sectors > bs->bl.max_transfer_length && !ret) {
>>              qemu_iovec_reset(qiov2);
>>              qemu_iovec_concat(qiov2, qiov, soffset,
>>                                bs->bl.max_transfer_length << BDRV_SECTOR_BITS);
>>              ret = bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
>>                                      bs->bl.max_transfer_length << BDRV_SECTOR_BITS,
>>                                      qiov2, flags);
>>              soffset += bs->bl.max_transfer_length << BDRV_SECTOR_BITS;
>>              sector_num += bs->bl.max_transfer_length;
>>              nb_sectors -= bs->bl.max_transfer_length;
>>          }
>>          qemu_iovec_destroy(qiov2);
>>          if (ret) {
>>              return ret;
>>          }
> The error check needs to be immediately after the assignment of ret,
> otherwise the next loop iteration can overwrite an error with a success
> (and if it didn't, it would still do useless I/O because the request as
> a whole would fail anyway).

There is a && !ret in the loop condition. I wanted to avoid copying the destroy part.

BTW, is it !ret or ret < 0 ?

>
>>      }
>>
>>      return bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
>>                               nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
> qiov doesn't work here for the splitting case. You need the remaining
> part, not the whole original qiov.

Right ;-)

Peter