From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49118) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bQ7yy-0006HY-Kn for qemu-devel@nongnu.org; Thu, 21 Jul 2016 03:01:21 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bQ7yu-0008IE-GP for qemu-devel@nongnu.org; Thu, 21 Jul 2016 03:01:20 -0400 Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:38550 helo=mx01.kamp.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bQ7yu-0008Gc-6v for qemu-devel@nongnu.org; Thu, 21 Jul 2016 03:01:16 -0400 References: <577A6955.6020603@kamp.de> <57900AB3.3040705@redhat.com> From: Peter Lieven Message-ID: Date: Thu, 21 Jul 2016 09:01:12 +0200 MIME-Version: 1.0 In-Reply-To: <57900AB3.3040705@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Regression: block: Add .bdrv_co_pwrite_zeroes() List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake , "qemu-devel@nongnu.org" Cc: Kevin Wolf Hi Eric, Am 21.07.2016 um 01:35 schrieb Eric Blake: > On 07/04/2016 07:49 AM, Peter Lieven wrote: >> Hi, >> >> the above commit: >> >> commit d05aa8bb4a8b6aa9a915ec5074fb12ae632d2323 >> Author: Eric Blake >> Date: Wed Jun 1 15:10:03 2016 -0600 >> >> block: Add .bdrv_co_pwrite_zeroes() >> >> introduces a regression (at least for me). >> >> The Limits from the iSCSI Block Limits VPD have no requirement of being >> a power of two. >> We use Dell Equallogic iSCSI SANs for instance. They have an internal >> page size of 15MB. And >> they advertise this page size as max_ws_len, opt_transfer_len and >> opt_discard_alignment. > Since I don't have access to this device, let me double check: if you > put a breakpoint in iscsi.c:iscsi_refresh_limits(), can you dump the > contents of the struct iscsilun->bl? What is the block size of this > device (512, 4096, something else)? I can choose between 512 and 4096. 512 is the default. Here are the advertised limits in the Block Limits VPD: $ iscsi-inq -e 1 -c $((0xb0)) iscsi://XXX/0 wsnz:0 maximum compare and write length:1 optimal transfer length granularity:0 maximum transfer length:0 optimal transfer length:0 maximum prefetch xdread xdwrite transfer length:0 maximum unmap lba count:30720 maximum unmap block descriptor count:2 optimal unmap granularity:30720 ugavalid:1 unmap granularity alignment:0 maximum write same length:30720 > > Also, while the device is advertising that the optimal discard alignment > is 15M, that does not tell me the minimum granularity that it can > actually discard. Can you determine that value? That is, if I try to > discard only 1M, does that actually result in a 1M allocation hole, or > is it ignored? It sounds like qemu should be tracking 2 separate > values: the minimum discard granularity (I suspect this number is a > power of 2, at least the block size, and perhaps precisely equal to the > block size), and the maximum discard granularity that results in the > fewest/fastest discard of the entire device (not necessarily a power of > 2). Or, maybe that merely means that qemu's pdiscard_alignment should > be the MINIMUM granularity, and NOT the non-power-of-2 > iscsilun->bl.opt_unmap_gran. As far as I know there is no minimum discard granularity. Only optimum and maximum. > > Or put another way, I get that I can't discard more than 15M at a time. > But I highly suspect that I do not have to align my discard requests to > 15M boundaries. That is, if the discard granularity is 1M, then in > qemu-io, 'discard 1M 15M' should result in a 15M hole, and should be no > different from the result of 'discard 1M 14M; discard 15M 1M'. But if > qemu sticks to pdiscard_alignment == iscsilun->bl.opt_unmap_gran of 15M, > then both operations mistakenly discard nothing (because it is not > aligned to a 15M boundary). I do not know what the storage does internally. But I agree the block provisioning info will not change. However, if you issue a discard 1M 15M and later a discard 0 1M it still might to report the first block as unallocated later. Peter