From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51462) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YIIy5-0006Df-OP for qemu-devel@nongnu.org; Mon, 02 Feb 2015 10:31:18 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YIIxy-0003uV-M4 for qemu-devel@nongnu.org; Mon, 02 Feb 2015 10:31:17 -0500 Received: from mx2.parallels.com ([199.115.105.18]:51030) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YIIxy-0003uI-H0 for qemu-devel@nongnu.org; Mon, 02 Feb 2015 10:31:10 -0500 Message-ID: <54CF9833.4070305@openvz.org> Date: Mon, 2 Feb 2015 18:30:59 +0300 From: "Denis V. Lunev" MIME-Version: 1.0 References: <1422607337-25335-1-git-send-email-den@openvz.org> <1422607337-25335-8-git-send-email-den@openvz.org> <20150202132355.GC9478@noname.redhat.com> <54CF81DA.3020003@kamp.de> <20150202140452.GG9478@noname.redhat.com> <54CF85BE.6030302@kamp.de> <20150202141617.GH9478@noname.redhat.com> <54CF87B1.1070404@kamp.de> <20150202144923.GI9478@noname.redhat.com> In-Reply-To: <20150202144923.GI9478@noname.redhat.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 7/7] block/raw-posix: set max_write_zeroes to INT_MAX for regular files List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf , Peter Lieven Cc: Fam Zheng , qemu-devel@nongnu.org, Stefan Hajnoczi On 02/02/15 17:49, Kevin Wolf wrote: > Am 02.02.2015 um 15:20 hat Peter Lieven geschrieben: >> Am 02.02.2015 um 15:16 schrieb Kevin Wolf: >>> Am 02.02.2015 um 15:12 hat Peter Lieven geschrieben: >>>> Am 02.02.2015 um 15:04 schrieb Kevin Wolf: >>>>> Am 02.02.2015 um 14:55 hat Peter Lieven geschrieben: >>>>>> Am 02.02.2015 um 14:23 schrieb Kevin Wolf: >>>>>>> Am 30.01.2015 um 09:42 hat Denis V. Lunev geschrieben: >>>>>>>> fallocate() works fine and could handle properly with arbitrary size >>>>>>>> requests. There is no sense to reduce the amount of space to fallocate. >>>>>>>> The bigger is the size, the better is the performance as the amount of >>>>>>>> journal updates is reduced. >>>>>>>> >>>>>>>> The patch changes behavior for both generic filesystem and XFS codepaths, >>>>>>>> which are different in handle_aiocb_write_zeroes. The implementation >>>>>>>> of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same >>>>>>>> thus the change is fine for both ways. >>>>>>>> >>>>>>>> Signed-off-by: Denis V. Lunev >>>>>>>> Reviewed-by: Max Reitz >>>>>>>> CC: Kevin Wolf >>>>>>>> CC: Stefan Hajnoczi >>>>>>>> CC: Peter Lieven >>>>>>>> CC: Fam Zheng >>>>>>>> --- >>>>>>>> block/raw-posix.c | 17 +++++++++++++++++ >>>>>>>> 1 file changed, 17 insertions(+) >>>>>>>> >>>>>>>> diff --git a/block/raw-posix.c b/block/raw-posix.c >>>>>>>> index 7b42f37..933c778 100644 >>>>>>>> --- a/block/raw-posix.c >>>>>>>> +++ b/block/raw-posix.c >>>>>>>> @@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) >>>>>>>> } >>>>>>>> } >>>>>>>> +static void raw_probe_max_write_zeroes(BlockDriverState *bs) >>>>>>>> +{ >>>>>>>> + BDRVRawState *s = bs->opaque; >>>>>>>> + struct stat st; >>>>>>>> + >>>>>>>> + if (fstat(s->fd, &st) < 0) { >>>>>>>> + return; /* no problem, keep default value */ >>>>>>>> + } >>>>>>>> + if (!S_ISREG(st.st_mode) || !s->discard_zeroes) { >>>>>>>> + return; >>>>>>>> + } >>>>>>>> + bs->bl.max_write_zeroes = INT_MAX; >>>>>>>> +} >>>>>>> Peter, do you remember why INT_MAX isn't actually the default? I think >>>>>>> the most reasonable behaviour would be that a limitation is only used if >>>>>>> a block driver requests it, and otherwise unlimited is assumed. >>>>>> The default (0) actually means unlimited or undefined. We introduced >>>>>> that limit of 16MB in bdrv_co_write_zeroes to create only reasonable >>>>>> sized requests because there is no guarantee that write zeroes is a >>>>>> fast operation. We should set INT_MAX only if we know that write >>>>>> zeroes of an arbitrary size is always fast. >>>>> Well, splitting it up doesn't make it any faster. I think we can assume >>>>> that drv->bdrv_co_write_zeroes() wants to know the full request size >>>>> unless the driver has explicitly set bs->bl.max_write_zeroes. >>>> You mean sth like this: >>> Yes, I think that's what I meant. >> I can't find the original discussion why we added this limit. It was actually the default >> before we introduced BlockLimits. And, it was also the default in the unsupported path >> of write zeroes which created big memory allocations. This might be the reason why >> we introduced a limit. > Commit c31cb707 added the limit to bdrv_co_do_write_zeroes(). Before, we > used a bounce buffer of unbounded size. > > Anyway, it seems that none of us can think of a reason not to apply the > patch to block.c. Let's just do it, and if it does break something, > we'll figure it out. Can you send it as a proper patch? > > Denis, if we apply that patch, would you be okay with dropping 7/7 from > this series, or would still something be missing? > > Kevin Sure. This will be even better. Something similar was implemented in v1/v2 of the patchset. Regards, Den