From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38916) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XWMya-0005Wt-Dy for qemu-devel@nongnu.org; Tue, 23 Sep 2014 06:05:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XWMyS-0001WW-0S for qemu-devel@nongnu.org; Tue, 23 Sep 2014 06:05:39 -0400 Received: from mx1.redhat.com ([209.132.183.28]:43530) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XWMyR-0001Vm-PP for qemu-devel@nongnu.org; Tue, 23 Sep 2014 06:05:31 -0400 Date: Tue, 23 Sep 2014 12:05:17 +0200 From: Kevin Wolf Message-ID: <20140923100517.GE3871@noname.str.redhat.com> References: <541AE887.9050607@redhat.com> <541C3095.4040405@redhat.com> <541FEF3A.30909@kamp.de> <54207349.2000107@redhat.com> <54210FEB.50900@kamp.de> <20140923085917.GB3871@noname.str.redhat.com> <54213E3B.5040302@kamp.de> <20140923094729.GD3871@noname.str.redhat.com> <542142D7.3040006@kamp.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <542142D7.3040006@kamp.de> Subject: Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Lieven Cc: Paolo Bonzini , ronniesahlberg@gmail.com, qemu-devel@nongnu.org, stefanha@redhat.com, mreitz@redhat.com Am 23.09.2014 um 11:52 hat Peter Lieven geschrieben: > On 23.09.2014 11:47, Kevin Wolf wrote: > >Am 23.09.2014 um 11:32 hat Peter Lieven geschrieben: > >>On 23.09.2014 10:59, Kevin Wolf wrote: > >>>Am 23.09.2014 um 08:15 hat Peter Lieven geschrieben: > >>>>On 22.09.2014 21:06, Paolo Bonzini wrote: > >>>>>Il 22/09/2014 11:43, Peter Lieven ha scritto: > >>>>>>This series aims not at touching default behaviour. The default for max_transfer_length > >>>>>>is 0 (no limit). max_transfer_length is a limit that MUST be satisfied otherwise the request > >>>>>>will fail. And Patch 2 aims at catching this fail earlier in the stack. > >>>>>Understood. But the right fix is to avoid that backend limits transpire > >>>>>into guest ABI, not to catch the limits earlier. So the right fix would > >>>>>be to implement request splitting. > >>>>Since you proposed to add traces for this would you leave those in? > >>>>And since iSCSI is the only user of this at the moment would you > >>>>go for implementing this check in the iSCSI block layer? > >>>> > >>>>As for the split logic would you think it is enough to build new qiov's > >>>>out of the too big iov without copying the contents? This would work > >>>>as long as a single iov inside the qiov is not bigger the max_transfer_length. > >>>>Otherwise I would need to allocate temporary buffers and copy around. > >>>You can split single iovs, too. There are functions that make this very > >>>easy, they copy a sub-qiov with a byte granularity offset and length > >>>(qemu_iovec_concat and friends). qcow2 uses the same to split requests > >>>at (fragmented) cluster boundaries. > >>Might it be as easy as this? > >This is completely untested, right? :-) > > Yes :-) > I was just unsure if the general approach is right. Looks alright to me. > >But ignoring bugs, the principle looks right. > > > >>static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs, > >> int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, > >> BdrvRequestFlags flags) > >>{ > >> if (nb_sectors < 0 || nb_sectors > (UINT_MAX >> BDRV_SECTOR_BITS)) { > >> return -EINVAL; > >> } > >> > >> if (bs->bl.max_transfer_length && > >> nb_sectors > bs->bl.max_transfer_length) { > >> int ret = 0; > >> QEMUIOVector *qiov2 = NULL; > >Make it "QEMUIOVector qiov2;" on the stack. > > > >> size_t soffset = 0; > >> > >> trace_bdrv_co_do_readv_toobig(bs, sector_num, nb_sectors, > >>bs->bl.max_transfer_length); > >> > >> qemu_iovec_init(qiov2, qiov->niov); > >And &qiov2 here, then this doesn't crash with a NULL dereference. > > > >> while (nb_sectors > bs->bl.max_transfer_length && !ret) { > >> qemu_iovec_reset(qiov2); > >> qemu_iovec_concat(qiov2, qiov, soffset, > >> bs->bl.max_transfer_length << BDRV_SECTOR_BITS); > >> ret = bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS, > >> bs->bl.max_transfer_length << BDRV_SECTOR_BITS, > >> qiov2, flags); > >> soffset += bs->bl.max_transfer_length << BDRV_SECTOR_BITS; > >> sector_num += bs->bl.max_transfer_length; > >> nb_sectors -= bs->bl.max_transfer_length; > >> } > >> qemu_iovec_destroy(qiov2); > >> if (ret) { > >> return ret; > >> } > >The error check needs to be immediately after the assignment of ret, > >otherwise the next loop iteration can overwrite an error with a success > >(and if it didn't, it would still do useless I/O because the request as > >a whole would fail anyway). > > There is a && !ret in the loop condition. I wanted to avoid copying the destroy part. Ah, yes, clever. I missed that. Maybe too clever then. ;-) > BTW, is it !ret or ret < 0 ? It only ever returns 0 or negative, so both are equivalent. I prefer checks for ret < 0, but that's a matter of style rather than correctness. Another problem I just noticed is that this is not the only caller of bdrv_co_do_preadv(), so you're not splitting all requests. The synchronous bdrv_read/write/pread/pwrite/pwritev functions all don't get the functionality this way. Perhaps you should be doing it inside bdrv_co_do_preadv(), before the call to bdrv_aligned_preadv(). Might even be more correct if it can happen that the alignment adjustment increases a request too much to fit in bl.max_transfer_length. Kevin