From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:51267)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1XKqOM-0006xq-EU
	for qemu-devel@nongnu.org; Fri, 22 Aug 2014 11:04:43 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1XKqOG-0001Rt-87
	for qemu-devel@nongnu.org; Fri, 22 Aug 2014 11:04:38 -0400
Received: from mx1.redhat.com ([209.132.183.28]:53097)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1XKqOG-0001Rl-0L
	for qemu-devel@nongnu.org; Fri, 22 Aug 2014 11:04:32 -0400
Received: from int-mx09.intmail.prod.int.phx2.redhat.com
	(int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s7MF4VlV021956
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256
	verify=OK)
	for <qemu-devel@nongnu.org>; Fri, 22 Aug 2014 11:04:31 -0400
Date: Fri, 22 Aug 2014 17:04:28 +0200
From: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20140822150428.GO32377@noname.redhat.com>
References: <1408558684-14162-1-git-send-email-mreitz@redhat.com>
	<1408558684-14162-4-git-send-email-mreitz@redhat.com>
	<20140821143126.GG4452@noname.redhat.com>
	<53F74CB4.2070606@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <53F74CB4.2070606@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v11 03/14] qcow2: Optimize bdrv_make_empty()
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Max Reitz <mreitz@redhat.com>
Cc: qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>

Am 22.08.2014 um 15:59 hat Max Reitz geschrieben:
> On 21.08.2014 16:31, Kevin Wolf wrote:
> >Am 20.08.2014 um 20:17 hat Max Reitz geschrieben:
> >>+/*
> >>+ * Calculates the number of clusters required for an L1 table for an image with
> >>+ * the given parameters plus the reftable and the refblocks required to cover
> >>+ * themselves, the L1 table and a given number of clusters @overhead.
> >>+ *
> >>+ * @ts:       Total number of guest sectors the image provides
> >>+ * @cb:       1 << @cb is the cluster size in bytes
> >>+ * @spcb:     1 << @spcb is the number of clusters per sector
> >Sectors per cluster, I guess.
> 
> Oops *g*
> 
> Now I can see why you were confused. *cough*
> 
> >>+ * @overhead: Number of clusters which shall additionally be covered by the
> >>+ *            refcount structures
> >>+ */
> >>+static uint64_t minimal_blob_size(uint64_t ts, int cb, int spcb,
> >>+                                  uint64_t overhead)
> >>+{
> >>+    uint64_t cs  = UINT64_C(1) << cb;
> >>+    uint64_t spc = UINT64_C(1) << spcb;
> >>+
> >>+    /* The following statement is a solution for this system of equations:
> >>+     *
> >>+     * Let req_cls be the required number of clusters required for the reftable,
> >>+     * all refblocks and the L1 table.
> >>+     *
> >>+     * rtc be the clusters required for the reftable, rbc the clusters for all
> >>+     * refblocks (i.e., the number of refblocks), l1c the clusters for the L1
> >>+     * table and l2c the clusters for all L2 tables (i.e., the number of L2
> >>+     * tables).
> >>+     *
> >>+     * cs be the cluster size (in bytes), ts the total number of sectors, spc
> >>+     * the number of sectors per cluster and tc the total number of clusters.
> >>+     *
> >>+     * overhead is a number of clusters which should additionally be covered by
> >>+     * the refcount structures (i.e. all clusters before this blob of req_cls
> >>+     * clusters).
> >>+     *
> >>+     * req_cls >= rtc + rbc + l1c
> >>+     *   -- should be obvious
> >The >= isn't, I would have expected =.
> 
> Leaking some clusters isn't too bad.

Fine with me. But you defined it as the "required number of clusters
required" (what?), which sounds like something smaller than the req_cls
that you calculate here.

Yes, nit-picking, but that's the part of this function that I still
understand, so let me make some use of it.

> >>+     * rbc      = ceil((overhead + req_cls) / (cs / 2))
> >>+     *   -- as each refblock holds cs/2 entries
> >Hopefully we'll never implement refcount_order != 4...
> 
> Actually, I thought about doing that just to have some fun. :-P

Sure, go ahead. You might just include it as a variable in this formula
now rather than revisiting all of this in three weeks.

> >>+     * rtc      = ceil(rbc                  / (cs / 8))
> >>+     *   -- as each reftable cluster holds cs/8 entries
> >>+     * tc       = ceil(ts / spc)
> >>+     *   -- should be obvious as well
> >>+     * l2c      = ceil(tc  / (cs / 8))
> >>+     *   -- as each L2 table holds cs/8 entries
> >>+     * l1c      = ceil(l2c / (cs / 8))
> >>+     *   -- as each L1 table cluster holds cs/8 entries
> >>+     *
> >>+     * The following equation yields a result which satisfies the constraint.
> >>+     * The result may be too high, but is never too low.
> >>+     *
> >>+     * The original calculation (without bitshifting) was:
> >>+     *
> >>+     * DIV_ROUND_UP(overhead * (1 + cs / 8) +
> >>+     *              3 * cs * cs / 16 - 5 * cs / 8 - 1 +
> >>+     *              4 * (ts + spc * cs / 8 - 1) / spc,
> >>+     *              cs * cs / 16 - cs / 8 - 1)
> >Should be obvious?
> 
> I had a PDF in the original version of this patch: https://lists.nongnu.org/archive/html/qemu-devel/2014-04/pdf5DmYjL5zXy.pdf

> >I think line 3 is for calculating l1c. That could easily be separated
> >out because it doesn't depend on any of the other variables.
> >
> >After doing some maths I'm relatively close to your formula, but I still
> >have 8 * cs instead of 5 * cs in the second line, and I also don't know
> >where you got the -1 from.
> 
> The -1 are there because we only have truncating division. As you
> can see in the PDF, I converted ceil(x / y) to floor((x + y - 1) /
> y). We could use floats and ceil(), but this might cause precision
> problems.

I see. That was too obvious. (I couldn't be bothered doing this
correctly, so I simply put +1 everywhere.)

> >And this is the reason why I asked for a method without precalculating
> >these sizes...
> 
> Either we do this or we have something slightly less complicated for
> the prealloction series and still a lot (but (much?) less
> complicated) code here.
> 
> As I said in my response to your response on v8, I still have
> rudimentary code for what you proposed but it actually seemed to
> complicated for me to do it right rather than just go on with this.
> 
> >>+     *
> >>+     */
> >>+
> >>+    return DIV_ROUND_UP(overhead + (overhead << (cb - 3)) +
> >>+                        (3 << (2 * cb - 4)) - (5 << (cb - 3)) - 1 +
> >>+                        (4 * (ts + (spc << (cb - 3)) - 1) >> spcb),
> >>+                        (1 << (2 * cb - 4)) - cs / 8 - 1);
> >>+}
> >>+

> >This may or may not work. I don't know. It's most certainly too much
> >magic for a corner case function that will need an awful lot of testing
> >to give us some confidence.
> 
> I just thought about it and reflected about why I did not pursue
> your idea. You basically suggested to mark the image dirty,
> overwrite the first few clusters with zero (one cluster for the
> reftable, one refblock, the rest for the L1 table), reset reftable
> and L1 table offset and size, allocate those clusters and remove the
> dirty flag.
> 
> This did not work because if qemu crashed between overwriting the
> first few clusters and having set up the new (empty) refcount
> structure, the image could not be repaired because the repair
> function could not allocate a refblock without overwriting the image
> header. However, with the patches for qcow2's repair function I sent
> just recently, this is not a problem any longer and the image can in
> fact be repaired.
> 
> So... I could say thanks for reminding me and sorry for having to
> read this patch; though Eric found it great fun. ;-)

It's okay, don't take my frustrated comments too seriously. We just need
to make sure that you can understand the code without reading a PDF
somewhere in the mailing list archives. Using some C variables instead
of doing everything in a single four-line expression would certainly
help, too. You can do it for the l1c part at least.

> On the other hand, this doesn't solve the prealloction problem. In
> Hu Tao's original version he had a function which calculated the
> size of a self-relfcounting blob as well, but his function was
> (obviously) wrong. As far as I remember, we do need such a function
> there and tuning it to its purpose there would make it only slightly
> less complicated. I guess I'll take a new look into his series and
> see whether we can do without.

There's another similar calculation in alloc_refcount_block(), even
though that one isn't really trying to get thing absolutely minimal. I'm
wondering if we can reuse this function there (even though a smaller
number of refblocks is needed), and if a simpler iterative calculation
like there might not be easier even if it makes the operation minimally
slower.

Though I'll freely admit that doing the maths, while causing more
headaches, is also more elegant.

> >If you still want to convince me, you need to do more to prove it's
> >correct, like improve the explanation of your formula
> 
> I'm fine with writing LaTeX into the code. :-P

Actually, if you just write the second line of your PDF into the source
code, I think that might already help a lot with understanding what's
going on.

Kevin